Mareh 1954 


JOURNAL OF THE BIOMETRIC SOCIETY 


The Design of Chemical Experiments 


The Exploration and Exploitation of Response 
Surfaces: Some General Considerations 


and Examples G. E. P. Box 


Doubly Balanced Incomplete Block Designs 
for Experiments in which the Treatment 
Effects are Correlated Lyle D. Calvin 


Fixed-Sample-Size Analysis of Sequential 
Observations F. J. Anscombe 


The Combination of Estimates from 
Different Experiments W. G. Cochran 


The Analysis of Variance with Various 
Binomial Transformations R. A. Fisher 


Suggested Desk Calculator Operations for 


Computing Moments by the Row F. M. Hemphill 


— 
== Veil. 10 Ne. 1) 
D. R. Read 


| 
4 
a 
~ 
— 
| 
a 


The Biometrie Society 


FOUNDED BY THE BIOMETRICS SECTION OF THE AMERICAN STATISTICAL ASSOCIATION 


TABLE OF CONTENTS 


The Design of Chemical Experiments... ... D. R. Read 


The Exploration and Exploitation of Response Surfaces: 
Some General Considerations and Examples_ . G. E. P. Box 


Doubly Balanced Incomplete Block Designs for Experiments 
in which the Treatment Effects are Correlated . Lyle D. Calvin 


Fixed-Sample-Size Analysis of Sequential Observations 
F. J. Anscombe 


The Combination of Estimates from Different Experiments 
W. G. Cochran 


The Analysis of Variance with Various Binomial Transformations 
R. A. Fisher 


Discussion of Article by R. A. Fisher 


Suggested Desk Calculator Operations for Computing Moments 
F. M. Hemphill 
Queries 


Abstracts 


61 


89 


Number 1 March 1954 Volume 10 


=| 

3 
16 
|_| 
| 101 
130 
152 

155 


Material for Biometrics should be addressed to Miss Gertrude Cox, Institute of 
Statistics. Box 5457, Raleigh, North Carolina, except that authors residing in one of 
the following organized regions can expedite the handling of their papers by sub- 
mitting them to the Assistant Editor for that region. 

British Region: Dr. D. J. Finney, 6 Keble Road, Oxford, Mngland: Australasian 
Region: Dr. I. A. Cornish, University of Adelaide, Adelaide, Australian; French 
Region: Dr. Georges Teissier, Faculte des Sciences de Paris, 1 rue V. Cousin, Paris, 
France. 

Material for Queries should go to Professor G. W. Snedecor, Statistical Laboratory, 
Iowa State College, Ames, Iowa. 


Articles to be considered for publication should be submitted in triplicate. 


THE BIOMETRIC SOCIETY 

General Officers 
President, W. G. Cochran; Secretary-Treasurer, C. I. Bliss; Council, H. C. Batson, 
L. L. Cavalli-Sforza, Georges Darmois, C. W. Emmens, D. J. Finney, Sir Ronald 
Fisher, J. O. Irwin, Arthur Linder, P. C. Mahalanobis, Donald Mainland, Leopold 
Martin, A. M. Mood, C. R. Rao, Georges Teissier, J. W. Tukey, Frank Yates, 
W. J. Youden. . 

Regional Officers 


Eastern North American Region: Regional President, 8S. L. Crump; Secretary-Treas- 
urer, A. M. Dutton. British Region: Regional President, R. R. Race; Secretary, 
E. C. Fieller; Treasurer, A. R. G. Owen. Western North American Region: Regional 
President, D. G. Chapman; Secretary-Treasurer, Elizabeth Vaughan. Australasian 
Region: Regional President, Helen N. Turner; Secretary, W. B. Hall; Treasurer, 
Mary A. Whitehead. French Region: Regional President, Georges Darmois; Secretary 
Treasurer, Daniel Schwartz. Belgian Region: Regional President, Paul Spehl; 
Secretary, Leopold Martin; Treasurer, Claude Panier. Italian Region: Regional 
President, C. Barigozzi; Secretary, L. L. Cavalli-Sforza; Treasurer, R. Scossiroli. 


National Secretaries 
Denmark, N. F. Gjeddebaek; The Netherlands, .. van der Laan; India, V. G. Panse; 
Germany, Maria-Pia Geppert; Japan, M. Hatamura; Switzerland, Arthur Linder; 
Sweden, H. O. A. Wold; Brazil, Americo Groszmann. 
Editorial Board 
Biometrics 


Editor: Gertrude M. Cox; Assistant Editors and Committee Members: C. I. Bliss, 
Irwin Bross, E. A. Cornish, W. J. Dixon, Mary Elveback, Ralph Bradley, D. J. 
Finney, S. Lee Crump, Leopold Martin, K. R. Nair, Horace W. Norton, H. Fairfield 
Smith, G. W. Snedecor and Georges Teissier. Managing Editor: Sarah P. Carroll. 


The Biometric Society is an international society devoted to the mathematical and statistical 
aspects of biology and welcomes to membership biologists, mathematicians, statisticians and others who 
are interested in its objectives. Through its regional organizations the Society sponsors regional and 
local meetings. National secretaries serve the interest of members in Denmark, the Netherlands, India, 
Germany, Japan, Sweden and Brazil and there are many members “at large’. Dues in the Society for 
1954 for residents of the Western Hemisphere are as follows: Full membership including subscription to 
Biometrics is $7.00. Members of the Biometrics Section of the American Statistical Association who 
subscribe to the journal through that organization may become members of The Biometric Society on 
the payment of $3.00 annual dues. For members in other parts of the world, full membership including 
subscription to Biometrics is $4.50, except that members who subscribe to the journal through the 
American Statistical Association pay annual dues of $1.75. Information concerning the Society can be 
obtained from the Secretary, The Biometric Society, Drawer 1106, New Haven 4, Connecticut, U.S.A. 

Annual subscription rates to non-members are as follows: For American Statistical Association 
Members, $4.00; for subscribers, non-members of either American Statistical Association or The Bio- 
metric Society, $7.00. Subscriptions should be sent to the Managing Editor, Biometrics, P. O, Box 
5457, Raleigh, North Carolina, U.S.A. 


Entered as second-class matter at the Post Office at New Haven, Conn., under 
the Act of March 3, 1879. Additional entry at Richmond, Va. Business Office, 
52 Hillhouse Ave., New Haven, Conn. Biometrigs is published quarterly—in March, 
June, September and December. ; 


4 
< 


THE DESIGN OF CHEMICAL EXPERIMENTS 


D. R. Reap 


Research & Development Dept., 
The Distillers Co. Ltd., 
Great Burgh, Epsom, Surrey 


INTRODUCTION 


The subject of the design of chemical experiments, sometimes 
treated under the wider heading of “statistical methods in chemistry”’ 
or “statistical methods in the chemical industry”, has already been 
dealt with from various points of view in a number of papers read and 
discussed or published during the last five years or so (and it may 
be recalled that a paper on this subject was given by Dr. O. L. Davies 
at the Second International Biometric Conference in 1949); so that in 
considering what would be the optimum scope of the present contribution 
it seemed that it would be more profitable to concentrate attention 
on one particular class of problems and to recount some of the more 
recent work on experimental design, rather than to make a general 
survey of the whole field. 

The present contribution begins with a summary of the main features 
that are more or less peculiar to chemical (as contrasted with biological) 
experimentation. This is followed by a description of methods that 
have been developed fairly recently by Box & Wilson (1951) for the 
study of chemical processes, with special reference to the determination 
of optimum conditions; and the paper ends with an example of the use 
of these methods in a typical development problem. Bearing in mind 
that it would in any event contain no novelty as regards basic statistical 
methodology (which has already been described elsewhere in the 
literature), this account has been framed mainly as an address to the 
experimenter rather than to the mathematical statistician, and a 
relatively non-mathematical treatment has been adopted: but in the 
hope of being able also to obtain the interest of the statisticians and to 
draw them into the discussion, the essential statistical problems have 
been given due emphasis. 


4 


2 BIOMETRICS, MARCH 1954 
MAIN FEATURES OF CHEMICAL EXPERIMENTATION 


Since the main field of application for which the science of experi- 
mental design was first developed is that of biology and agriculture, 
it is natural to consider this field as a standard for comparison; and 
for present purposes the particular features of chemical experimentation 
may be described conveniently and simply by stating the features that 
differ from those generally obtaining in the biological field. The chief 
features are as follows: 

(i) Experimental error is often small in comparison with the effects 
to be estimated; there is often no need for any form of local control 
(i.e., the working materials tend to be substantially homogeneous); 
and a reasonably reliable estimate of error may be available from 
previous experiments, so that one can sometimes dispense with the 
requirement that the experiment shall provide its own (internal) 
estimate of error. 

(ii) Experiments are generally carried out sequentially. In other 
words, the various experimental treatments are tested one after another, 
or at least in small groups, and at any stage of the investigation the 
results for all earlier treatments will be available to determine the most 
effective way of continuing the investigation. If the complete set of 
experimental treatments is not determined initially, a full randomisation 
will not be possible; but this drawback will generally not be serious. 

(iii) In a large proportion of cases all the experimental factors are 
quantitative rather than qualitative. 

In certain chemical investigations the circumstances may, of course, 
be very similar to those obtaining in biology (nor is the above list 
intended to suggest that biological work is always a matter of enormous 
unpredictable errors!): the present paper, however, is confined to 
consideration of the one set of features—experimental error small 
and a prior estimate available; no need for local control; sequential 
procedure; and quantitative factors. Following the example set by 
Box & Wilson (1951), the word “experiment” will be used in the rest 
of the paper to mean a single reaction (i.e., chemical operation with 
one particular combination of factor levels): although this usage is 
not customary with many statisticians, it is more acceptable to the 
chemist and is also convenient for the description of sequential in- 
vestigations. 

As regards the general objects in chemical experimentation, from 
the statistical point of view they do not differ in any marked way from 
those of biological work: in the majority of cases it is a matter of detec- 


Ht 
bie 
| 
be 


CHEMICAL EXPERIMENTS 


3 


tion and estimation either of effects, or of components of variance, or 
of the relationships between several variables. There is, however, an 
emphasis on the more or less direct and immediate estimation of “opti- 
mum conditions’’, e.g., conditions giving maximum yield or minimum 
cost; and this is the case that will be discussed below, after a brief 
word on designs for other types of problem. 

For many chemical problems a suitable experimental design is 
to be found amongst those already developed for biology and agri- 
culture (see for instance Cochran & Cox, 1950), and these designs are 
already in fairly wide use in the chemical field. Randomised blocks, 
Latin squares and factorial designs are commonly employed to consider- 
able advantage; and in particular there is very good use being made of 
fractional replicate factorials (or partial factorials, as some prefer to 
call them). On the methods of analysis there is nothing special to be 
said except that, since factors are more often quantitative rather than 
varietal, the accent is on regression methods when factors are at more 
than two levels. It does not seem necessary to add anything more on 
these points, since the designs and analytical methods have been fully 
described in the text-books. 


DESIGN OF EXPERIMENTS FOR ESTIMATION OF OPTIMUM CONDITIONS# 
GENERAL CONSIDERATIONS 


A typical problem encountered in chemical research and develop- 
ment is that of finding the optimum conditions for some reaction 
process; and it may be assumed for sake of argument that it is required 
to maximise the yield of a given product, since the alternative problem 
of minimising the yield of an undesired by-product or of minimising 
cost amounts to the same thing as regards experimental design. : It is 
also assumed that the particular factors needing to be investigated and 
adjusted have already been discovered from preliminary work, and | 
that these factors are all exactly measurable quantitative continuous 
variables. 

In considering such a problem it is helpful to picture the correspond- 
ing geometrical model given by an ordinary physical hill or mountain 
which may be described by means of a contour diagram (Fig. 1). This 
diagram represents a 3-dimensional model, x, and x, being the levels of 
two experimental factors (e.g., reaction temperature and concentration 
of one reactant), and the contours representing yield, measured upwards 
perpendicular to the plane of the paper. In practice, of course, one will 


tIt must be made clear that the following account is merely a summary of the ideas and procedure 
already described by Box and Wilsor. (1951) and that the credit for these developments is entirely theirs. 


i 


wo 


BIOMETRICS, MARCH 1954 


generally be concerned with more than two factors, corresponding to a 
model in space of more than three dimensions; but the simpler model in 
three dimensions is adequate for purposes of the present argument. 


The object of the investigation is to determine the co-ordinates 2, , 22 
of the summit of the hill. 


z2 


FIG. 1. CONTOUR REPRESENTATION OF REACTION SURFACE 


The so-called “‘classical’”’ procedure of altering one factor at a time 
is clearly liable to be rather ineffective, particularly when the number 
of factors is larger than two: starting from a given base point (the 
best prior estimate of the optimum conditions) the procedure will, 
except in a few lucky cases, result in a devious route towards the summit, 
requiring a relatively large number of experiments. The alternative of 
carrying out experiments at points on a grid may also be ineffective, 
since a large number of points will be required if the grid is fairly fine 
and covers a wide area; and if the grid is coarse, or if it is confined to a 
small area, the summit may well be missed entirely, or at least it may be 
impossible to obtain any reasonably accurate estimate of the optimum. 

It will be realised that the grid method is identical with ordinary 
factorial design; and accordingly it will be asked whether the difficulty 
can be overcome by using fractional replication, interpolating between 
(or extrapolating beyond) the experimental points by fitting a suitable 
form of polynomial regression equation. Unfortunately it cannot. If 
the experiments were to cover a wide area, so that one could be reason- 
ably sure of including or nearly including the maximum, an adequate 
fit would probably only be obtainable by using a polynomial of the 
third or higher degree; and if they were restricted to a small area that 
might be far from the maximum, the same high degree of polynomial 
would probably again be required, for making a rather long extra- 


‘ 
4 = 


CHEMICAL EXPERIMENTS 


polation: but these requirements can only be met by a fractional replicate 
that still involves a relatively large number of experiments. 

The key to the whole problem lies in making full use of the sequential 
nature of the test procedure, by carrying out experiments in a sequence 
of small groups, estimating simple regressions that give approximate 
local fits to the reaction surface, and using the estimated regression 
from each group as a guide to the most effective disposition of experi- 
mental points in the next group. 

Before describing the procedure in greater detail, there is one 
particularly important point to be noted. The true equation of the 
reaction surface will probably be something relatively complicated 
involving, for instance, a polynomial of the fifth degree; whereas the 
various stages of the investigation will consist in fitting equations of 
the first or second degree. It is essential to realise that, in addition to 
ordinary experimental error arising from uncontrolled variables, the 
estimated derivatives will generally be biased (or, in the language of 
the theory of fractional replication, they will be aliased with higher 
derivatives). Therein lies the art of experimentation: the spacings 
between factor levels must be wide enough to produce reasonably large 
effects (compared with experimental error); but they must not be so 
wide as to introduce any serious bias in the estimated derivatives. 
There is fortunately a saving clause, however, in that at each stage the 
observed yields themselves will tend to show whether the assumptions 
of the previous stage were justified. 


PROCEDURE 


Starting from a base point corresponding to the best prior estimate 
of the optimum conditions, the first step is to construct and carry out 
a small initial group of experiments around this base point, to estimate 
the main effects (i.e., linear effects or first-order partial derivatives) or 
at least to show in which direction the optimum conditions may lie: 
for most cases the best design here is a 2" factorial arrangement (k = 
number of factors), in fractional replicate if k > 3. Analysis of the 
results will show whether or not the first-order effects are predominant 
in this particular region: if substantial interaction effects are found it 
will be concluded that the yield surface is appreciably curved, and that 
further experiments must be carried out with the immediate object of 
estimating all second derivatives (see later). 

If the first-order effects are predominant, it may be assumed that 
the surface can be represented locally by the plane 


by + + Daw, + 


1 


6 BIOMETRICS, MARCH 1954 


(where y is the yield and zx, , 7, , --- are the levels of the experimental 
factors), and considerable advance “up the hill” may be made im- 
mediately by following the calculated path of steepest ascent which 
corresponds to altering each factor x, in proportion to its estimated 
first derivative b; , starting from the base point at the centre of the 
design. Two or three further experiments at points on this calculated 
path will give a provisional estimate of the optimum conditions; but 
it is probable that considerable further improvement can be made, 
since the calculated path is not likely to pass very near the summit. A 
further group of experiments is accordingly carried out around the 
provisional optimum and a new path calculated (unless the first-order 
effects are obscured by those of second-order); and so on. 

The method of steepest ascent will clearly defeat itself in the end, 
since the first-order effects will necessarily decrease to zero as the summit 
is reached; but the method does provide a useful procedure for moving 
rapidly from a base point that is far from the optimum to a point within 
“striking distance” of it; and if a more precise estimate of optimum 
conditions is required, further improvement may be obtained by taking 
second-order effects into account, using the method about to be described. 

If the initial group of experiments comprises a 2" factorial arrange- 
ment, and if the results show the need for estimating all second-order 
derivatives, then additional experiments must be carried out at certain 
points. Conversion of the 2" factorial into a 3° factorial would, of 
course, achieve the desired result; but the number of experiments would 
be rather large and, moreover, a 3° arrangement would appear to be 
wasteful, since it provides estimates of various high-order derivatives 
that are not required. Statistical theory concerning second-order 
designs has in fact not yet been fully developed; but a useful solution 
has been given by Box & Wilson in what they call “composite” designs, 


- which consist of a basic factorial arrangement with additional points 


situated either symmetrically around the centre of the factorial (‘central 
composite”’ design) or near to one corner of it (“non-central composite” 
design). Examples of these designs, based on a 2° factorial with co- 
ordinates given by 2, , 22 , 73 = +1, are shown below. 


Central Non-central 
No. of extra points 7(i.e., 2k + 1) 3(i.e., k) 
Co-ordinates of extra 0, 0, 0; +a, 0, 0; 4, 4503 


points 0, +a, 0; 0, 0, +a; | or (ii) a, 1, —1; 1, a, —1; 1,1, —e@. 


‘| 
@ 
Ah 
i 
| | 


CHEMICAL EXPERIMENTS 7 


Values for a can be chosen so as to obtain any desired compromise 
between precision and bias; and in particular with the central composite 
design a may be chosen so that the design is orthogonal (i.e., so that 
the estimated regression coefficients will all be mutually independent). 
The advantages of the non-central composite design are that it requires 
fewer additional points and it may (and usually will) be arranged so that 
the centroid of the design is shifted in the probable direction of the 
maximum, which is likely to improve the fit of the approximating 
second-degree equation: but it has the disadvantages that the quadratic 
effects will not be very precisely determined (unless @ is large, which 
will however lead to increased bias) and that they will be rather highly 
correlated. A central composite design will usually be employed when 
the results of the initial group of experiments indicate that the region 
covered by it is already fairly near (and possibly includes) the maximum; 
otherwise a non-central composite design will be used. 

Having chosen one or other type of composite design, together 
with a suitable value for a, the additional experiments are then per- 
formed and the results used to estimate a second-degree regression of 
the form 


y = by + bit, + + + + 


Examination of the coefficients of this equation, by standard methods 
of analytical geometry, will indicate the type of surface represented 
by the fitted equation, and in particular it may be seen whether the 
surface has an absolute maximum or merely a rising slope, a ridge, or a 
col (saddlepoint). 

If an absolute maximum is indicated, its co-ordinates will be cal- 
culated, one (or more) confirmatory experiments carried out at (or 
in the region of) this point, and a new regression including these further 
results fitted if necessary. This procedure of successive approximation 
may be continued until no further appreciable improvement is obtained. 
The case where the first analysis does not show an absolute maximum 
requires further consideration. It may be, of course, that the true 
yield surface does not have an absolute maximum; alternatively, the 
absence of a maximum in the fitted equation may be due merely to 
experimental error or bias (or both): but in either event the most 
profitable course to pursue will generally be shown by a canonical 
analysis of the fitted equation, which will give the form and orientation 
of the corresponding contour system. If a ridge or col is indicated, 
additional experiments should be carried out at points along the top 
of the ridge, or up the nearest side of the col (following the calculated 
path of greatest gain). It is then to be hoped that a fresh analysis will 


bi 2. 


BIOMETRICS, MARCIE 1951 


reveal the actual nature of the yield surface more clearly, and further 
progress may be made accordingly, to map the contour system = in 
greater detail, and to determine any absolute maximum if this is still 
of practical interest. 

It should be noted that this method of attack will generally achieve 
a considerable amount of useful information in addition to that merely 
concerning the position of the maximum. In practice it will often be 
required to determine a set of compromise conditions referring to two 
or more dependent variables with maxima (or minima) at different 
positions; and in such cases it is the general form of the contour system 
that is of main interest. As Box himself has said (reply to discussion, 
Box & Wilson, 1951, p. 44): “It appears, at least in chemical application, 
that clearly defined point maxima in many variables are rather rare, 
and often the most important practical problem is to determine the 
nature of the local ridge system. Here the local fitting of second-degree 
equations followed by canonical analysis usually provides a very strong 
lead.” 

The above description is admittedly somewhat condensed; but 
it is hoped that sufficient details have been given to show the essential 
features of the problem itself and of the proposed method of attack, 
as a basis for discussion by this meeting. The results of an actual 
chemical investigation conducted along these lines will now be briefly 
presented as an illustration of the general procedure: working details 
will be omitted, however, since they have already been well described in 
Box & Wilson’s paper (and in any event would take up too much space). 


EXAMPLE 


The following investigation was carried out in the Research and 
Development Department of The Distillers Co. Ltd. A reaction of the 
form A (liquid) + B (gas) + C (catalyst) — D + E + other products 
was being studied, on the laboratory scale; and it was required to 
discover the conditions to maximise the yield of D (y, ; moles % of 
D per mole of A charged) and to minimise the percentage of F in the 
reaction products (y,), since  .was practically inseparable from D. 
It was also required to maximise the efficiency of conversion of A 
to D (y; ; moles % of D per mole of A consumed), though this was 
regarded as a subsidiary requirement. Conditions giving y, ~ 13 and 
y2 ~ 6 were already known from preliminary work; but it was considered 
that it should be possible to find conditions giving y, = 30 — 40 and 
y»2 <1. Experimental error was expected to be fairly small (a standard 
error of about 1.5 for y, and about 0.5 for y.); but each experiment was 
rather time consuming, involving a reaction and an analytical distillation 


+ 


CHEMICAL EXPERIMENTS 9 


taking 4-5 days, and it was essential to keep the number of experiments 
down as far as possible. 

After preliminary discussion between the chemist and the statistician 
it was decided to investigate the four factors listed below, starting with 
a half-replicate 2* arrangement, with factor levels that appeared most 


suitable in the light of past experience and from general theoretical 
considerations. 


Factor Variable i 
(i) Proportion of B to A 2 
(ii) Proportion of C to A Ze 
(iii) Reaction temperature 23 
(iv) Rate of feed of B x4 


For purposes of statistical analysis the factors were transformed to 
new variables z, ; the first, second and fourth factors were first trans- 
formed to logarithms for theoretical reasons, and then all were subjected 
to linear transformations so that the co-ordinates of points in the 
initial factorial design were given simply by z; = +1. | 

The general course of the investigation is summarised in Table 1, 
which gives the co-ordinates of the experimental points and the cor- 
responding observed values of y, , y2 and y;. The results of the first 
group of experiments (1-8) showed a number of definite main effects; 
but for both y, and y; there were indications of an appreciable inter- 
action between x, and x, and/or between x, and x, (these two inter- 
actions were aliased), and it was decided to proceed at once to estimation 
of second-order effects with some form of composite design (Expts. 
9-16). The main effects estimated from Expts. 1-8 fortunately indi- 
cated that the optima for all three dependent variables lay in much 
the same direction (except for a conflict between y, and y; as regards 
x, , which was settled by temporarily ignoring y;); and it was evidently 
desirable to adopt a non-central composite arrangement with additional 
points in this direction: but since the original 2* factorial design was 
carried out in half-replicate, and since the general equation of second 
degree contains fifteen constants to be fitted, it was necessary to add at 
least seven extra points instead of merely the four (Expts. 9-12) cor- 
responding to the non-central composite design given in the previous 
section. Expts. 13-16 were added with the intention of covering these 
requirements: but unfortunately the corresponding co-ordinates were 


| 

4 

| 

| 


10 BIOMETRICS, MARCH 1954 


TABLE 1 


Experimental conditions and results 


| 

1 1 —1 11.2 7 26.8 
2 —-1 -1 1 1 11.0 7.5 25.5 

3 —1 1 —1 1 10.8 9 22.5 

+ -1 1 1 -1 14.3 2 31.6 

5 1 —1 -1 1 10.6 9 16.3 

6 1 -1 1 -1 20.0 2 28.7 

7 1 1 —1 12.8 

8 1 1 1 1 17.2 2.5 24.9 

2 9 3 1 1 -1 1.7 2 1.7 
10 1 3 1 -1 20.1 1 26.6 

11 1 1 3 -1 27.4 0.5 34.2 

12 1 1 1 —3 19.3 1 25.8 

13 0 0 0 0 15.1 5 26.1 

14 1 i 1 -1 21.2 1 28.4 

15 2 2 2 —2 19.2 1 20.7 

16 3 3 3 —3 2.7 0 2.7 

3 17 1 —1 1 —1 17.3 2.5 24.4 
18 1 1 8 -1 32.9 0 43.8 

19 0.35 2 6 —1 34.4 0 49.5 

20 0.35 2 9.5 —1 27.7 0 41.8 

4 21 0.75 3 4 -1 25.8 0.5 33.4 
22 0.75 1 4 0.5 33.6 0 41.7 

23 1.25 3 8 —1 32.5 0.5 44.0 

24 1.25 1 8 0.5 29.6 0.5 36.8 

5 25 0.95 1.84 6 —0.59 | 37.2 0 47.6 

| 


chosen incorrectly and, owing to pressure of circumstances and the 
author’s inexperience at that time in the numerical inversion of large 
matrices, it was not discovered until the experiments were nearly 
completed that the matrix of sums of squares and products was singular. 
This was admittedly a bad mistake. The situation was retrieved, 
however, by temporarily omitting from the fitted equation the two 
interaction terms b,, and b,, (which were expected to be relatively 
small) and then introducing them in the fitting at a later stage. 

Mr. Box kindly pointed out subsequently, in a private communica- 
tion to the author, that a satisfactory non-central composite design 


‘ 
4 
t 
a 
| 
Peon 
| 
ade 
ta 
4. 
4 
| 
44 
| 


CHEMICAL EXPERIMENTS 11 


may be obtained from a three-quarter-replicate 2‘ factorial. In the 
present case, therefore, Expts. 13-16 should have been carried out at 
and (1, 1, 1, —1). 

Analysis of the results of Expts. 1-16 failed to indicate any maximum 
in y, OF Y3 , Or & minimum in y, ; and the conditions for a further experi- 
ment were accordingly worked out by considering the path of greatest 
gain for y, which suggested a provisional optimum at (2, , 2 , 23 , %) = 
(0.35, 2.0, 6.0, —1.0), with predicted results 9, = 35.8, 9, = 46.3. 
(Chronologically this became Expt. 19). Conditions for the other three 
experiments in Group 3 were arrived at as follows: Expt. 17 was a 
repeat of Expt. 6 which required confirmation for certain practical 
reasons; and Expts. 18 and 20, with z, = 8 and z; = 9.5, were the result: 
of receiving some information from an outside source. 

The next analysis was performed on the results of Expts. 1-18 only: 
the procedure was not entirely sequential, since there was always a 
certain unavoidable lapse of time between the end of a reaction and 
conclusion of the analysis of the results (owing to the time required for 
fractionation of the products, chemical analysis and statistical analysis), 
and it was required to start Expts. 21-24 before a further formal analysis 
including the results of Expts. 19 and 20 could be made. The variable 
Y2 was ignored in this analysis, since it was obviously reduced to a 
negligible figure at values of x; > 3; and the new equation for y; again 
showed no maximum. The equation for y, showed a maximum at 
, 22, = (1.25, 1.75, 11.01, —2.85), giving estimated values 
Gj, = 35.6, 9; = 48.8; but from the elements of the covariance matrix 
it was evident that the estimated levels for x; and x, were not very 
precise; and from the results of Expts. 19 and 20, which became available 
for inspection shortly before Expts. 21-24 were started, it appeared 
that the optimum levels of x; and x, were more probably in the neigh- 
bourhood of 6.0 and —1.0 respectively. It was also seen that higher 
values of x; were inadvisable, owing to the formation of appreciable 
amounts of an undesirable by-product F which had not been observed 
in any of the first 17 experiments. The conditions for Expts. 21-24, 
which were carried out in order to give a better mapping of the yield 
surface in the neighbourhood of the optimum, were accordingly taken 
so as to bracket the Jevel x; = 6.0, with various combinations of levels 
of the other factors so as to bracket the optimum in similar fashion. 

A final estimate of the optimum conditions was obtained by analysis 
of the results from all 24 experiments: the coefficients in the fitted equa- 
tions (Table 2) showed an absolute maximum for y; as well as for y, , 
and the co-ordinates of the maxima are given in Table 3. The estimate 


‘ 


12 


BIOMETRICS, MARCI 1954 


ry = —6.54 for maximum y; is far outside the range of levels actually 
tested (largely due to the experiments having been conducted mainly 
with reference to y, rather than y;), so that this result could only be 
taken as a rough approximation; but in any event the value of g cor- 
responding to maximum gj, was regarded as satisfactory for current 
practical requirements. The over-all fit of the final equations was not 
particularly good, the residual mean squares (9 d.f.) corresponding to 
standard errors of 3.1 and 3.4 which reflect the considerable degree of 
bias that must have arisen chiefly through having to make a rather long 
extrapolation in x; ; but the observed values of y, and y; are themselves 
sufficient to indicate that the method was at least successful in giving 
considerable improvement with a moderate number of experiments. 
The canonical analysis of the fitted equations is given in the Appendix. 


TABLE 2 
Coefficients of final fitted equations 


| | 
| 
| Ys | Ys 
bo | 18.273 | 28.450 | bag 0.249 | 0.350 
1.920 | -—2.392 | bog 0.535 0.762 
by 1.283 | 0.720 | —0.088 | ~—0.380 
hs 3.362 | 4.098 by —3.242 —2.815 
by —0.988 | 1.620 | —0.705 —0.712 
bio | 0.138 | 0.433 | bss -—0.318 | —0.363 
big 0.662 0.555 | by —0.487 —0.462 
| —0.013 | 0.755 
| | | 
TABLE 3 
Co-ordinates of estimated maxima 
| | <3 <4 
Max. fory, | 1.09 2.09 | 7.32 | —0.58 33.2 43.2 
Max. for y;| —0.61 | —1.21 8.01 —6.54 8.7 50.5 
Max. for y; | 
at z; = 6 0.95 | 1.84 6.00 | —0.59 32.8 42.7 
| | | 


In view of appreciable amounts of the by-product F having been 
obtained with x, > 8, it was considered desirable to allow an adequate 
margin in this factor, and an alternative working optiinum was calcu- 


ay 
| 
| 
| 
ate 
4 
| 
| 
| 
| 
tha 
Is 


CHEMICAL EXPERIMENTS 


13 
lated for the particular level «2, = 6: the results (given in Table 3) 
showed that the values of 7, and 7, were not appreciably less than those 
atv, = 7.32. A final experiment (25) confirmed that these conditions 
did not give rise to any detectable amount of F; and although the 
observed values of y, and y; were somewhat greater than the predicted 
values, the differences were scarcely greater than were to be expected 
on account of the errors involved. The investigation of the present 
four factors ended at this juncture, since sufficient information had 
been obtained as a preliminary to developing a process suitable for 
working on a larger scale. 


SUMMARY 


The main features that are more or less peculiar to chemical (as 
contrasted with biological) experimentation are briefly set out, and 
attention is subsequently confined to the following case which often 
arises: experimental error small and a prior estimate available, no 
need for local control, sequential procedure and quantitative factors; 
the object of the investigation being to estimate a set of optimum 
conditions for a chemical reaction process, e.g., conditions giving 
maximum yield or minimum impurity. 

A short account is given of the methods already developed by 
Box & Wilson, which consist essentially in carrying out experiments 
in a sequence of small groups, estimating first- or second-degree re- 
gressions that give approximate local fits to the reaction surface, and 
using the estimated regression from each group as a guide to the most 
effective disposition of experimental points in the next group. These 
methods are then illustrated by an example of their use in a typical 
problem of chemical development. 


ACKNOWLEDGMENTS 


The author wishes to thank the Directors of The Distillers Company 
Limited for permission to publish this paper. The laboratory work 
was carried out under the supervision of Dr. A. R. Graham; and his 
keen and critical interest in the statistical work is gratefully acknowl- 
edged. 


REFERENCES 


Box, G. EB. Po and Wilson, IX. B., On the experimental attainment of optimum eondi- 
tions. J. R. Statist. Soc., B, 13, 1-45, L951. 
Cochran, W. G. and Cox, G. M., Epertmental designs. New York: Wiley, 1950. 


big 4 

et 


14 BIOMETRICS, MARCI 195+ 
APPENDIX 
Canonical Analysis of Fitted Mquations 


The canonical form of the second-degree equation, which cor- 
responds to a shift of the origin to the stationary point and rotation of 
the co-ordinate axes (Ox,) to coincide with the principal axes (OX,) 
of the corresponding k-dimensional quadric, may be written 


(ye — = + + + ANG, 


where y, is the value of y on a given contour surface, and y’ is the value 
of y at the stationary point. The new variables X; are related to the 
old variables z; by the equations 


X; = {mj xi)}, 


where m,; is the cosine of the angle between OX, and Qz; , and 2} is 
the value of z; at the stationary point. 

The values of the coefficients \; and the matrices (17) of direction 
cosines for the two variables y, and y; are as follows: 


Ay 2 
uv —0.236 —0.323 —0.913 —3.280 
Y3 —0.116 —0.250 —1.068 ~—2.919 
412 M= | 0.104 0.395 0.875 0.259 | 
0.044 —0.419 0.422 —0.803 
0.001 —0.818 0.210 0.536 
0.994 —0.022 —0.110 0.008 | 
M= 0.153 0.558  —0.059 0.813 | 
0.104 0.284 0.942  —0.147 
0.071 —0.777 0.311 0.542 
| 0.980 —0.061 —0.114 —0. 150 | 


It need hardly be pointed out that these estimates of the properties 
of the contour surfaces are doubtless subject to considerable errors 
(which have not been evaluated), quite apart from the fact that in any 
event the true yield surface is probably only roughly approximated by 
an equation of second degree. It is of interest to obtain some idea of 
the shape and orientation of the fitted contour surfaces; but the results 
of the above analysis hardly warrant much more detailed consideration 
than is given below- 


| 
| 
| 
| 
nig | 
i 
aie 
4 


CHEMICAL EXPERIMENTS 15 


It will be noted first of all that the contour surface for y, is of approxi- 
mately the same shape as that for y, , with two relatively long axes 
(OX, , OX.) and one relatively short axis (OX,). The coefficients in 
the third and fourth rows of the matrix VW for y, are also approximately 
the same as those for y, , showing that OX, and OX, are oriented in 
much the same direction, in contrast to OX, and OX, which exhibit 
considerable divergence. 

The two relatively small coefficients \, and \, indicate a plateau of 
near-optimum response (not very extensive), with a more marked 
stationary ridge for y,; along OX, . Both for y, and for y; the shortest 
axis OX, is practically parallel to Or, , showing that z, is the most 
critical of the four variables—in the scale adopted, that is. 


| 


THE EXPLORATION AND EXPLOITATION OF RESPONSE 
SURFACES: 


SOME GENERAL CONSIDERATIONS AND EXAMPLES* 


G. E. P. Box 
Imperial Chemical Industries, Dyestuff Division Headquarters, 
Manchester, England, 
and the Institute of Statistics, University of North Carolina 


Some three years ago Dr. K. B. Wilson and the author read a paper 
[1] before the Royal Statistical Society concerning the experimental 
attainment of optimum conditions. The methods there discussed 
grew out of experience acquired in a number of chemical investigations. 
Since that time many such investigations have been conducted in 
England along the lines we suggested, and more recently these procedures 
have been utilised at North Carolina, State College in experiments 
concerning such widely different topics as the flooding capacity of 
pulse columns (work performed under contract with the Atomic Energy 
Commission), and the motility and viability under various storage 
conditions of bull sperms. The object of this paper is to discuss and 
to illustrate with examples certain ideas which have arisen from the 
work and which it is believed may be of value in a wider field than that 
of chemical research. We shall concern ourselves with general principles. 
Details of the methods and calculations have been explained in [1] 
and more recently have been presented in a simplified form in the 
final chapter of a book on the design of experiments authored by a 
team of Imperial Chemical Industries’ Chemists and Statisticians [2], 


AN OUTLINE OF THE PROBLEM 
1. Response Surfaces 


We suppose that the experimenter is concerned with elucidating 
certain aspects of a functional relationship 


n = X;) (1) 


connecting a response n such as yield, with the levels x, , 22 , +++, % 
of a group of k quantitative variables’ or “factors” like temperature, 
time, and pressure of reaction, concentration of reactants and speed of 
agitation. More generally he will be concerned not with a single 


*Sponsored by the Office of Ordnance Research, United States Army under contract DA-36-034- 
ORD-1177. 


16 


\ 4 
| 
“4 
4 
a 
| 
ls 


RESPONSE SURFACES \7 


response but with a number of responses whieh he will wish to bring 
to the most satisfactory levels possible. For example he will often be 
seeking conditions which maximize a major response 9, such as vield 
(or more often cost of manufacturing unit quantity of product), while 
maintaining some auxiliary response n, such as purity, at the best level 
possible. ‘To begin with we confine our discussion to a single response 
m Which for convenience we assume is the yield of product. When 
only one factor x (such as temperature) is studied the response func- 
tion (1) could frequently be described by a graph like that in Figure | 


80 F 


YIELD 
PERCENT L 


50 F 


j 


80 120 
TEMPERAT URE 


LEVEL OF VARIABLE x 


FIGURE 1.) A RESPONSE CURVE FOR ONE FACTOR 


With more than one factor it is tempting to generalize this by suppos- 
ing that the response function could be represeuted by a surface like a 
more or less symmetrical mound, the contour representation of which 
would be like that shown in Figure 2. 

That such a generalization is entirely inadequate in practice is 
found as soon as we begin to carry out experiments in which actual 
response surfaces can be roughly plotted. It is then clear not only that 
surfaces are frequently attenuated in the neighborhood of maxima as 
in Figure 3, but also that ridge systems like that of Figure 4 and 5 are 
of common occurrence. 

It will be noted that any section of these figures taken parallel to 
either axis will vield a curve like that in Figure 1, so that these surfaces 
are entirely built up from such curves. 


Is BIOMETRICS, MARCIL 1954 


60 CONTOURS 
70 OF 
LeveL © 80 YIELD 
oF ¢& ( ) 
VARIABLE 
Xo § 4 


TEMPERATURE 
LEVEL OF VARIABLE x, 


FIGURE 2. CONTOURS OF A TWO-FACTOR RESPONSE SURFACE 


LEVEL OF x, 


FIGURE 3 
ATTENUATED MAXIMUM 


| 
4 
LEVEL 
OF So 
oO 
| Xo 
| 
| oe 
| 


RESPONSE SURFACES 19 


LEVEL OF Xx, 


FIGURE 4 
STATIONARY RIDGE 


2. Factor Dependence. 


The reason for the occurrence of such systems can be seen when it is 
remembered that factors like temperature, time, pressure, concentration, 
etc., are only regarded as “natural” variables because they happen to 
be quantities that can be conveniently measured separately. A more 
fundamental variable not directly measured, but in terms of which 
the behaviour of the system could be described more economically 
(e.g. frequency of a particular type of molecular collision), will often be 
a function of two or more natural variables, (e.g., temperature, concen- 
tration, pressure). For this reason many combinations of natural vari- 
ables may correspond to the best level of a fundamental variable. To 
quote a simple example suppose that in the region of interest. the 
response Was most economically described in terms of some fundamental 
variable the level of which was proportional to the produet wo = .r).e, 
of two measured variables and Thus w = and = 
Suppose that the latter relationship was that represented in Figure 6a. 


60 

is) f 


of 


20 BIOMETRICS, MARCH 1954 


LEVEL OF x, 


FIGURE 5 
RISING RIDGE 


If the experimenter did not know that the system could adequately be 
desgribed in terms of the compound variable w and carried out experi- 
meffs in which x, and x, were varied separately, he would be exploring 
as stem for which the yield surface was that shown in Figure 6b, which 
is like that in Figure 4. Figure 6b which describes a commonly occurring 
practical situation was constructed by drawing contour lines through 
those points giving a constant product 2,2, , the appropriate yield being 
from Figure 6b. 

stead of the function w = 2,2, we might have considered other 
functions w = f(a, ,2.). It will be found for example that the functions 
w=atbr, ter, ,w = ,w = ar} exp {—(e/x,)} all produce 
diagonal ridge systems running from the top left to the bottom right 
of the diagram like that in Figures and 6b, while the functions w a+ 
be, ~ Ct, ,w = = ar} exp --(er,){ all produce ridges running 
in the contrary sense. ‘These ridge systems are of course associated with 


i 
D 
74 
LEVEL 
72 
{ OF 
| *e > 
| 
| 


RESPONSE SURFACES 21 


CONTOURS 
OF 
a 12 F YIELD 
60 
80+ \. 70 
70 
Yl 
60} 
8 
i i L 
80 100 120 8 Te) 12 
wis x, 
FIGURE 6a FIGURE 6b 


GENERATION OF A RIDGE SYSTEM 


interaction. The former type is associated with a negative two-factor 
interaction between x, and x, whilst the latter is associated with a 
positive two factor interaction. 

The simplest type of ridge system is that produced by the linear 
relationship w = a + br, + cx.. The ridge system generated by such 
a function will have parallel straight-line contours running in a direction 
determined by ‘the relative magnitudes of b and c. A section at right 
angles to these contours will reproduce the original graph of 9 on w. 
We note (for future use) that over limited ranges we should expect all 
ridge systems to be capable of approximate representation by this simple 
linear type. For expanding w = f(x, , t,) ina Taylor’s Series about a 
local origin, we have w = a + br, + cr, + (terms of second and higher 
order) where a is the value of the function at the origin, and b and ¢ 
are the values of the partial derivatives with respect to x, and xr, at 
the origin. 

We could say in the examples quoted above that there is a ‘redun- 
daney’ of one variable. The apparently two-variable system can be 
expressed in terms of a single fundamental ‘compound’ variable w. 
The physical and biological sciences abound with examples where, 
over suitable ranges of the variables, relationships exist similar to the 
above. 

Frequently the surface which describes the system while not of the 
form shown in Figure nevertheless contains marked component 
of this type together with an additional component resulting ino an 
elongated maximum like that shown in Figure 8, or a system like that 
shown in Figure 5, where the ridge is steadily rising to higher yields. 


22 BIOMETRICS, MARCIL 1954 


In the latter case the best practical combination may be the most 
extreme point on the ridge that can be attained by the experimenter. 
The situations illustrated in Figure 3, 4, and 5 may be said to show 


factor dependence in the sense that the response function for one factor 


is not independent of the levels of the other factors. The idea of factor 
dependence is analogous to that of Stochastic dependence and diagrams 
like Figure 3 call to mind the familiar contour representation of a 
bivariate probability surface. Again there exists the same analogy 
between the ‘fundamental variables’ we have discussed and the ‘factors’ 
in factor analysis (see for example Thurstone [3]). 

These analogies are helpful but it must be emphasized that the two 
types of dependence are quite distinct and care should be taken to 
differentiate them. We are concerned here with the relationship between 
a response like yield and the levels of a set of variables which can be 
varied at our choice like temperature, time, etc. We do not need any 
ideas of probability to define this relationship. In the analogous 
stochastic situation probability takes the place of response and sto- 
chastie variables such as ‘test scores’ take the place of the fixed variables 
like temperature, time ete. 

The investigation of factor dependence is of considerable practical 
importance. 

(i) Where the surface is like that in Figure 4, not one, but a whole 
range of alternative optimum processes corresponding to points along 
the crest of the ridge will be available from which to choose. In practice 
some of these processes may be far less costly, or more convenient to 
operate, than are others. The factors in this system may be said to be 
compensating in the sense that departure from the maximum response 
due to change in one variable can be compensated by a suitable change in 
another variable. The direction of the ridge indicates how much one 
factor must be changed to compensate for a given change in the other. 

If we imagine the contours for some auxiliary response 7, super- 
imposed on those for the major response yn, We see that (provided the 
contours of the two systems are not exactly parallel) we could select 
for our optimum process that point on the crest of the ridge for the 
major response which maximized the auxiliary response. Thus both 
the major response and the auxiliary might simultancously be brought 
to their best levels. 

(ii) When the surface is like that in Figure 3 the direction of attenu- 
ation of the surface indicates those directions which departures 


can be made from the optimum process with only small loss in response. 
(iii) The detection of a rising ridge in the surface like that in Figure 5 


| 
4 
| 
3 
i 
| 
- 
& 
4 i 


RESPONSE SURFACES 23 
supplies the knowledge that if the variables are changed together in the 
direction of the axis of the ridge then yield improvement is possible 
even though no improvement is possible by changing any single variable. 

(iv) Of by no means least importance is the fact that discovery of 
factor dependence of a particular type may, in conjunction with the 
experimenter’s theoretical knowledge lead to a better understanding of 
the basic mechanism of the reaction. Thus if experimentation with 
the variables x, and x, produced a surface like that of Figure 6b he 
would be led to expect that some more fundamental variable of the 
type w = 2.x, existed. 

So far we have illustrated our discussion with examples in which 
there were only two variables 2, and x, . Such examples can be illus- 
trated geometrically either by means of a three dimensional diagram 
in which two dimensions are used to accomodate the variables x, and 
x, and the third to accommodate the response n, or by two dimensional 
diagrams like Figures 2, 3, 4, and 5 in which the response 7 is repre- 
sented by contour lines. 

The situation which can occur in many variables becomes progres- 
sively more complicated and there is a distinct danger that by getting 
into the habit of thinking in only 2 variables we may over-simplify the 
problem. This danger is lessened if we become familiar with a method 
for representing the relationship between 3 variables x, , x, , 2; and a 
response » by a three dimensional diagram which the contour surfaces 
for y are shown. Figure 7 shows such representation. 

This diagram ean be regarded as being built up from two dimensional 
contour diagrams. For example suppose that a series of two dimen- 
sional yield-contour diagrams for the variables temperature (v,) and 
concentration (xv) were made for various levels of time (v;). If these 
were drawn on sheets of transparent material and the sheets were 
then placed one behind the other at the appropriate points on the time 
axis, ON joining up corresponding contour lines to form contour surfaces, 
we would obtain a diagram like Figure 7. 

In this representation in the neighborhood of a symmetrical point 
maximum (analogous to that in two dimensions shown in Figure 2) 
the contours would enclose the maximum point like the skins of an 
onion around its center. Insensitivity to change in conditions when 
variables were changed together in a certain way would correspond to 
attenuation of the contours in the direction of the compensatory changes. 
As anextreme form of this attentuation we could imagine a line maxi- 
mum in the space (ie. a dine such that all sets of conditions on it gave 
the same high yield) surrounded by eylindrical contours of falling yield. 


Le 
‘ 


24 


BIOMETRICS, MARCH 1954 


CONCENTRATI ON 


% 


TEMPERATURE 


FIGURE 7. CONTOUR REPRESENTATION OF THREE-FACTOR SYSTEM 


We would call this a line stationary ridge since it is analogous to the 
two dimensional case illustrated in Figure 4. Many other possibilities 
occur but we shall leave their fuller discussion till later. 

It should not be thought that factor dependence is only to be expected 
in connection with well defined physical and chemical phenomena, 
nor that ‘single variable’ redundancy is all that need concern us. 

Suppose, for example, the problem concerned the making of a 
‘ake, the object being to bring some desirable property such as ‘texture’ 
(which is our response 7 and which we suppose can be measured on 
some numerical scale) to the ‘best’ level. The natural variables, k 
in number, whose levels we denote by wx, , a, +++ , a, might be numerous 
and involve among other things the amount of such substances as 
baking powder (,), flour egg white (vy) and citrie acid (x,). 

Suppose it had happened that the ‘texture’ 7 depended in reality 
on only two fundamental variables, the consistency of the mix w, and 


= 
ay 

~ 

| \ 

\\ 

Xo 


RESPONSE SURFACES 25 


its acidity w, , and that optimum texture was attained whenever w, 
and w, were at their optimum levels w,) , Wao . For simplicity we will 
make the somewhat unreal assumption that w, and w, , measured in 
some suitable way, were linear functions of the amounts (2, , 7. , +++ , 2) 
of the various ingredients i.e., 


consist ency wW, = a, + + + + pit 
+ Dox, + + dots + +++ + Doty 


Each of the coefficients b, , ¢, , --- , p, in the first equation would 
measure the change in consistency due to adding a unit amount of the 
corresponding ingredient. ‘Thus the coefficient would be positive for 
solid substances like flour and negative for liquid substances like water. 
The optimum consistency would thus be obtained on the k — 1 dimen- 
sional planar surface in the k dimensional space of the factors 


Wino = A, + + + + + 


acidity We 


Similarly, each of the coefficients b, , c. , «++ , pz in the second equation 
would measure the change in acidity due to the addition of a unit of 
the corresponding ingredient and would be positive for acid substances 
and negative for alkaline substances. The optimum acidity would 
thus be obtained on a second k — 1 dimensional! planar surface 


Woy = Ay + Dox, 4+ Cot. + doay + +++ por, 
2 2 


The intersections of the two planes would give all the levels for which 
both acidity and consistency were at their best levels, and hence for 
which optimum texture was attained. Optimum texture would thus 
be attained on a plane of k — 2 dimensions. That is to say there would 
be k — 2 directions at right angles to each other in which we could 
move while still maintaining optimum texture for the cake. We can 
express the k variable system in terms of two fundamental ‘compound’ 
variables w, and w, ; there is thus a redundancy of k — 2 variables 
corresponding to the k — 2 dimensions in which the maximum is at- 
tained. Were it not for the danger of confusion involved in using the 
phrase in the present context we would say that the maximum had 
k — 2 ‘degrees of freedom’. 

The surface of this example is a generalisation of the stationary ridge 
of figure 4. Rising ridges, attenuated maxima and combinations of 
these phenomena will all have generalisations ino many dimensions, 
We shall discuss these generalisations later. 


It is clear that the experimenter should have adequate investiga- 
tional tools to explore response surfaces. For if he can appreciate at 


ie 


26 BIOMETRICS, MARCH 1954 


least approximately the main features of the surface with which he is 
dealing, this may lead not only to process improvement but also to his 
obtaining a more fundamental understanding of basic process mechan- 
isms. 


3. The Roles of the Statistician and the Experimenter. 


We have been discussing from a geometrical standpoint the response 
function which underlies a given system and certain features of which 
it is desired to elucidate experimentally. We can perhaps avoid some 
misunderstandings by using our geometrical analogy to delineate more 
clearly the problems in experimentation which are statistical and those 
which are essentially non-statistical and in which this science cannot 
be of help. We shall speak in what follows of the experimenter (by 
which we mean the chemist, biologist, or engineer who is conducting the 
experiments), and the statistician as two individuals. Occasionally the 
experimenter will be his own statistician in which case the terms will 
indicate the questions which will require his statistical skill and those 
which require his other knowledge. The statistician’s role is not, and 
cannot be, to design experiments in any absolute sense. The statis- 
tician’s function is to advise the experimenter on the best positioning 
of experimental points in a space which the experimenter must of 
necessity construct for him, and construct purely on the basis of the 
experimenter’s expert background knowledge of the subject in which 
he is experimenting. Were this not so there would be little point in 
training chemists, biologists, physicists ete., only statisticians. 

The experimenter decides, 


(i) which factors should be varied: Because the amount of effort 
which can be exerted on any given problem is in practice not anlimited 
he must often choose a few factors which he believes will be important 
out of a larger number which might be. 

(ii) in what way the factors should be varied: They might be varied 
directly or as ratios or products of the “natural” variables as we have 
discussed above. 

(iii) by how much the factors should be varied: He decides both 
absolutely and relatively the amounts by which the factors should be 
varied, and hence tacitly the units in which the surface is imagined to 
be drawn. 


He thus decides which particular transform of which particular factor 
space shall be considered. 

In a given problem different experimenters might well include or 
fail to include different variables and vary them in different ways by 


We 
hy 
4 
i 
alt 
ae 
4 
| 
i 


RESPONSE SURFACES a 


different amounts. The statistician may help to some extent by general 
words of advice on these matters, but when he has given this advice 
and the experimenter has decided the questions mentioned above, he 
can only devise experiments which explore the space the experimenter 
has defined as efficiently as possible. The ultimate success of the 
experiment as a whole (in contrast to the statistical exercise). must 
necessarily depend on the skill of the experimenter. No amount of 
artistry in statistical design can compensate for the omission of the 
most important factor. . 

It should be borne in mind that the major indeterminancies men- 
tioned above are major indeterminancies of experimentation as a whole 
and are not associated with the use of any particular method. We can 
take comfort in the thought that in spite of them scientific research 
often manages to achieve useful results. 


METHODS FOR EXPLORING THE RESPONSE SURFACE 
4. General Considerations. 


In order to determine and hence exploit multi-factor dependence we 
must at some stage perform groups of experiments capable of deter- 
mining how the factors jointly influence the response. Any method 
which attempts to avoid the multi-factor problem by varying only 
one factor at a time will be almost valueless when the response surface 
contains a ridge, except possibly as a preliminary procedure. 

As an illustration, consider the behavior of the ‘one factor at a time’ 
method when the surface is like one of those in Figures 3, 4, and 5. 


(i) If the experimenter starts at the point 1 in Figure 5 a series of 
experiments in which 2, is varied keeping x, constant performed along 
AB will lead to the conclusion that F is the best level of x, .. Experiments 
in Which x, is now varied, keeping x, at its previous best level, along 
CD will lead to a point almost indistinguishable from /, and in the 
presence of experimental error will lead the experimenter to the con- 
clusion that 2 is a maximum. 

(ii) A similar difficulty occurs in the situation of Figure 4. Although 
the experimenter would in this case have reached a point of maximum 
yield on the ridge he would not know of the existence of the ridge and 
hence of the possibility of using alternative and often more convenient 
processes. 

(iii) In the example of Figure 3 the experimenter would at best 
follow a tortuous path to the maximum advancing by small increments 
along a zig-zag route. At worst if the error was not very small he would 
often mistake a lower point on the ‘“‘ridgy” surface for a true maximum. 


28 BIOMETRICS, MARCH 1954 


The problems mentioned above in the two-factor case are even more 
acute when there are more than two factors and multi-factor dependence 
has to be dealt with. 


The writer’s attention has been repeatedly drawn to the behavior of 
the one factor at a time method in studies that have been carried out to 
improve processes already operating. Usually the current process will 
have been arrived at by the “one factor at a time method’’, which in 
various guises has been the normal procedure used by the chemist. 
When (as is often the case) further marked improvement is possible 
this is usually because the one factor at a time experiments have become 
“stuck on a ridge’. Approximate determination of the local surface 
and exploitation of the local factor dependence is essential to further 
progress. The provision of methods for doing this is thus extremely 
important. The advantage to the experimenter in using such methods 
is not merely that fewer experiments are required to attain a given 
result which could ultimately have been reached by traditional methods, 
but that a result can be obtained that could not have been got by such 
methods. 

We may explore the system by fitting some sufficiently elastic 
surface to the observations at a suitable set of pre-selected points and 
then examining the fitted surface. In the absence of specialized know]- 
edge the most suitable model for this purpose seems to be the generalized 
polynomial equation which for two variables is 


n = Bo + (B12, + + + Bests + 
+ + + + +etc. (2) 


and corresponds to representing the function by its Taylor’s series. 
The brackets delimit the terms containing coefficients of first, second 
and third order. The equation containing all coefficients up to rth 
order is said to be of rth degree. The first degree equation defines a 
plane, the second degree equation a quadric surface. The surfaces 
defined by third and fourth degree equations we shall call cubic and 
quartic surfaces. Table 1 shows the number of terms (1) contained in 
equations of degree 1, 2, 3 and 4 when the number of factors is 2, 3, 4 or 5. 
The number of experiments V must be at least as great as L, the number 
of constants fitted. We see therefore that the number of experiments 
necessary to fit the equation will rise rapidly with its degree. Ilowever, 
the higher the order of the terms included, the smaller the effect they 
usually will have, and we shall include terms only to that order necessary 
to give an approximate representation of the surface in the region 


studied. 


ie 
i 
4 
| 
| 
4 i} 
| 
=| 
| 
+ i 
| 


RESPONSE SURFACES 29 


TABLE 1 


NUMBER OF CONSTANTS (L) TO BE FITTED FOR EQUATIONS OF 
VARYING DEGREE 


k d = Degree of fitted equation 
Number of 
Factors 1(plane) 2(quadric) 3(cubic) - 4(quartic) 
2 3 6 10 15 
3 4 10 20 35 
4 5 15 35 70 
5 6 21 56 126 


The equation can readily be fitted to the observations by the method 
of least squares (which is in this context sometimes called the method 
of multiple regression). Using this technique estimates by , b, , b2 , ete. 
of the coefficients By , 8; , B2 , etc. are determined which make the sum 
of squares of discrepancies between the observed values and those 
predicted by the fitted equation as small as possible. Estimates of 
this sort may be shown to have specially desirable properties. In 
practice (as we shall describe later) the N levels of the variables will be 
specially selected to give accurate estimates. 

Essentially what is being done is to use the technique of ‘multiple 
regression” in the circumstance in which the values for the variables 
or factors are chosen in advance rather than merely observed and 
because of these specialized levels the calculations are greatly simplified. 

For convenience we do not work directly with the “natural” variables 
like temperature, time, etc., but with the quantities x, , 7, , ete., which 
will refer to “standardized variables” in which the origin is taken to be 
the center of the design and the units are fixed by the amounts the 
natural variables are changed in the design. We shall not necessarily 
limit the relation between natural and standardized variables to be a 
linear one. In some circumstances the use of logarithms or some other 
function may be appropriate. 

An equation of fairly high degree would usually be needed to repre- 
sent the response surface adequately over the whole experimental region 
(the whole region of possible variation of the factors). The fitting of 
such a surface could involve an excessive number of experiments. We 
can examine the surface in a smaller subregion however by a fairly 
simple equation. 

If, for example, we were close enough to the ridge system or maximum 
we might represent its main features approximately by an equation of 


2 


30 BIOMETRICS, MARCIL 1951 


only second degree (see Figures 10 and 11). Tlowever the starting eon- 
ditions would often not be close to the maximum or ridge system, par- 
ticularly if the system were being investigated for the first time. The 
mathematical procedure of fitting selects from all possible surfaces of the 
degree fitted that which approximates the responses at the experimental 
points the best (in the sense of least squares). The features of the fitted 
surface at points remote from the region of the experiment would 
probably bear no resemblance to the features of the actual surface. 

The experimenter needs some preliminary procedure therefore to 
bring him to a suitable point near the maximum or ridge where the 
second degree equation could most usefully be fitted. One such pre- 
liminary procedure is the one factor at a time method already discussed. 
An alternative which in the author’s opinion is usually more effective 
and economical in experiments (at least in the field of application 
where it has been tried) is the ‘steepest ascent”? method. This prelimi- 
nary procedure followed by the fitting of an equation of second degree 
(or if necessary of higher degree) is the basis of the sequential method 
proposed in [1]. 


5. A Sequential Method for Exploring the Response Surface. 


We have seen that the more elaborate the equation which is to be 
taken as a model the greater is the number of experiments which must 
be performed to fit it. If we can proceed sequentially therefore (that 
is to say if we can use the result of one set of experiments to decide the 
location of the next set), it would seem best to begin by fitting locally 
the simplest possible equation, abandoning it for a more elaborate 
one only when the circumstances showed this to be necessary. 


(i) We should begin by fitting a first degree equation representing 
a plane. If this plane seemed to provide a reasonably close fit and was 
sloping in some particular direction then progress could be made by 
proceeding up it in the direction of greatest slope, or steepest ascent. 
Experiments performed in this direction would lead to a point where 
no further gain was obtained. Here a second plane could be fitted and 
the procedure repeated. Since we would only be fitting this very simple 
type of equation any advance which could be made by such a procedure 
would be progress attained very economically. 

(ii) Sooner or later it would become clear that the slopes of the last 
fitted plane were not large compared with the second order effects and 
consequently that further progress by this method was not possible. 
This situation might be found in the original region in which experiments 
were conducted (i.e., it might be found after performing the initial set of 


il 
| 
| 
i 
| 
{| 
| 
| | 
aq 
+ 
q 
4] 


RESPONSE SURFACES 31 


exploratory experiments), or it might be found after one or two applica- 
tions of the steepest ascent procedure had brought the experimenter to 
a region of higher yield than that at the starting conditions. Thus in 
one Way or another would be attained a near-stationary region. That 
is a region in which little change occurred in the response when the 
factor levels were changed. In the situations illustrated in Figures 3, 
4, and 5 the four points at the corners of a square indicate the four 
experiments arranged in a 2° factorial design which might be used to 
determine the first order effects. The arrow indicates the direction of 
steepest ascent. The steepest ascent procedure will in each case lead 
the experimenter to a near-stationary region on the crest of the ridge 
surface. Further progress by this method would not then be possible 
but in the neighborhood of the best point attained a suitable pattern 
of experiments would be performed and the second degree equation 
could now be fitted with maximum effect. 

(iii) The analysis of this fitted second degree equation would iiidicate 
certain features of the surface requiring further study. For example the 
fitted surface might suggest the existence of a local maximum or some 
stationary or rising ridge system. : 

(iv) Further confirmatory points at positions indicated by the first 
fitted surface would therefore be added and the surface refitted including 
the information from these extra points. This may usually be done 
with least labor by employing a technique due to Plackett [4], and [1]. 

(v) If it appeared that the second degree equation was such a poor 
fit as to be an inadequate model then steps would be taken to fit a 
third degree equation. Extra constants may be added to the model 
without undue recalculation following a method given by Box and Hunter 
[5]. 

The basic ideas in these procedures are not of course new. In 
particular both the device of steepest ascent and that of fitting equations 
in the neighborhood of maxima have been employed in a problem 
closely analogous to ours, that of solving mathematical equations 
by approximate methods (see for example Booth [6] and Koshal {7]). 
Their application to the present problem, in the manner we propose, 
does however appear to be novel. 


6. Laxperimental Designs. 

Each combination of levels of the factors corresponds to a point in 
the factor space and the pattern of such points used to elucidate the 
surface is called the experimental design. For example, the sets of 
four points in Figures 3, 4, and 5 correspond to the 2” factorial design, 
and the eight points in Figure 7 to the 2° factorial. 


i 
{ 
ae 
4) 


32 BIOMETRICS, MARCIT 1954 


It will be understood that arbitrary patterns of experimental points 
might fail not only to provide accurate estimates of the constants but 
might not even allow certain constants to be separately estimated at all. 
When attempting to obtain a best distribution of experimental points 
for the purpose of fitting a polynomial equation three considerations 
must be kept in mind. 


(i) On the assumption that a polynomial of the degree assumed can 
adequately represent the surface in the region examined, the design 
should be such that the errors of estimate of points on the fitted surface 
should be small. 

(ii) The biasses in the estimated coefficients, which might occur if 
the assumed equation were representationally inadequate, should be 
small. 

(iii) If possible, provision should be made to estimate certain of 
the coefficients or combinations of them of higher order that those in 
the assumed form of equation to be fitted. Study of these sample 
members of estimated higher order effects will provide some indication 
of whether the assumption that these terms can be ignored is a reason- 
able one or not. 


The number N of experimental points needed to determine L con- 
stants cannot be less than L and we should normally require that it 
should exceed this number. If N were equal to L the process of fitting 
would inevitably force the V values Y, , Y., --- , Yy predicted by the 
fitted equation to agree exactly with the observed values y, , y2,--* , Yy- 
In this circumstance we could, of course, draw no conclusion at all as 
to whether the fitted surface were really representing the true surface 
ornot. If N were greater than L, however, the N values Y,, Y.,---, ¥» 
predicted by the fitted equation at the N set of conditions would differ 
from the N observed values y, , ¥2 , --* , yy - The sum of squares of 
the discrepancies S = )>*., (y; — Y;)° (which is of course the quantity 
minimized in the “least squares” procedure) is called the residual sum 
of squares. If this quantity is divided by N — L (called the residual 
number of degrees of freedom) we have an estimate of o° the experi- 
mental error variance provided that the real surface can be represented 
by a function of the form assumed. 

If the real surface cannot be approximately represented by an 
equation of the form assumed S/(N — L) tends on the average to exceed 
o by an amount which depends on the magnitude of the constants 
which have been ignored and on the sensitivity of the design chosen 
to the departures from assumption which have occurred. In general 


| 
| 
\ 4 
) 
| 
| 
4 
i 
4 
lis. 


RESPONSE SURFACES 33 
the design will be sensitive to the existence of those higher order terms 
which can be separately estimated in (ili). 

If therefore some estimate of the experimental error variance is 
available (this might be quite a rough estimate from previous experience 
or some more precise estimate from replication of experiments), and 
if the design is suitably chosen, comparison of S/(N — L) with the esti- 
mate enables the experimenter to obtain some conception of the goodness 
of fit of the postulated equation; that is to say, its representational 
adequacy in the particular circumstance of the experiment. A good 
fit does not necessarily imply that S/(N — L) is small but only that it 
is of the magnitude anticipated for the experimental error variance. 
If S/(N — L) is much larger than the experimental error variance this 
points to the necessity for including terms of higher order. It should 
be emphasized again that the sensitivity of this ‘goodness of fit test’ 
to particular types of departure from the assumed form of equation 
depends on the design used. A more extensive discussion of this question 
will be found in [5]. 

It is encouraging to find that experimental design is a field where 
virtue is rewarded, for designs which satisfy the criteria mentioned above 
usually result in extremely simple calculations. 

The designs employed in [1] for determining the best fitting equation 
of first degree were the two-level factorials and fractional factorials. 
These designs provide efficient estimates of first order effects and also 
allow certain of the second order effects (some or all of the two-factor 
interactions) to be examined. This examination provides some indica- 
tion of whether a model in which second order terms are ignored is ade- 
quate or not. A more fundamental consideration of the problem of first 
order design is given in [8]. 

Designs for determining the best fitting equation of second degree 
without undue expenditure of experiments were developed empirically 
in [1]. In these a two level factorial or fractional factorial which allowed 
the estimation of all linear and two factor interaction terms was aug- 
mented with further points which allowed the quadratic effects to be 
determined also. These designs were called composite designs. 

When it appeared that the design was symmetrically placed with 
respect to the region it was desired to study, a “central”? composite 
design was used (illustrated for three variables in Figure 8). When 
the region of interest lay towards one corner of the factorial design, it 
was augmented in that region with further points to form a non-central 
composite design (illustrated for three variables in Figure 9) 

A more fundamental consideration’ of this design problem has 


34 BIOMETRICS, MARCH 1954 


recently been undertaken [9]. ‘This study indicates, that the composite 
designs which have, by now, been used in many practical examples are 
reasonably efficient but are capable of further improvement. The 
original composite designs are employed in the examples that follow in 


§§8, 9, and 10. 


Oo 
FIGURE 8 
CENTRAL COMPOSITE DESIGN 


FIGURE 9 
NON-CENTRAL COMPOSITE DESIGN 


= 
| 
= 
i 
| 
4 
i 
y 
‘|| 
| 
| 


RESPONSE SURFACES 35 


7. Canonical Analysts of the Fitted Second Degree Equation. 


One of the most striking things to come out of the application of the 
technique described has been the power of elucidation afforded by the 
analysis of the second degree equation after a near stationary region 
was attained. This technique of the fitting and analysis of a second 
degree equation has been effective when marked factor dependence 
made the system difficult or impossible to study by other means. This 
we shall illustrate in the next section with a number of actual examples. 
Before we consider these, we explain briefly the basis of the method of 
analysis of the second degree equation. This is essentially the reduction 
of the fitted equation to canonical form. This is, of course, a standard 
process Which is given in text books in coordinate geometry; however, 
it may be of value to explain it here essentially from the point of view 
of our problem. 

Suppose the fitted second degree equation for the two variables x, 
and x, is 


Y = by + bit, + bot. + bits + + (3) 


The coefficients b, and b, are called the linear effects, b,, and b.. the 
quadratic effects, and b,. the interaction effect. By mere inspection 
of the coefficients of equation (3) we usually could not perceive the type 
of surface which it represented. Still less could we do this when more 
than two variables were involved. Our analysis consists essentially of 
restating the information contained in the coefficients of the original 
equation in another more readily comprehended form. 

There are basically only two types of surface which this second 
degree equation can represent. The contours of these two basic surfaces 
are shown in Figures 10a and 10b and we shall later show that the 
surfaces of 10¢ and 10d may be regarded as special forms of these. 
For the surfaces of 10a and 10b, if we take the center of the system (S) 
as origin and the principal axes Y, and Y, of the system as axes of a 
new coordinate system, it will be seen by inspection that their equations 
in the new coordinate system will reduce to the form 


Vs = + BX; (4) 


where Y', is the predicted response at S. By this transformation we 
have reduced the equation to one with only quadratic effects B,, and B,, 
(measured along the axes VY, and N,). 


(i) If B,, and B,, are of the same sign we have a contour diagram 
like Figure 10a with elliptical contours. ° In this figure B,, and B,,. are 
both negative and the center of the system S is a maximum. If the 


5 


36 BIOMETRICS, MARCH 1954 
coefficients are both positive, the surface is like a trough instead of a 
mound and S is a minimum. 

(ii) If B,, and B,, are of opposite signs we have a saddle point 
(sometimes called a col or minimax) i!lustrated in Figure 10b. 


(c) (d) 


FIGURE 10. FUNDAMENTAL AND LIMITING SURFACES GENERATED BY A SECOND 
DEGREE EQUATION IN TWO DIMENSIONS 


In either case if one of the coefficients is small in magnitude compared 
with the other then the surface is attenuated along the axis corresponding 
to the small coefficient. In Figure 10a, B,, is smaller than B,, . 


(iii) If B,. were zero we would have the stationary ridge surface 
shown in Figure 10c (which might be regarded as a surface like that in 
Figure 10a or 10b infinitely attenuated along the X, axis). 

(iv) If B,, were zero and the center of the system were at infinity 


Xe 

X 
\ Xx, \ 

| 79 78% a 

| Z \ 

i 4 

\ 

(b) 

| (a) 

\ 
\ 

| \ 78 %, 

| 79 78 

78 % 

4 

7 


RESPONSE SURFACES 37 


we would have the rising ridge surface shown in Figure 10d in which 
the contours of the ridge were parabolas. 


Denoting the coefficients (B,, , B..) in the canonical equation by —, 0, 
or + the systems 10a, b, ¢, and d, are typified by (— —), (—+), (—90) 
and (—O, centre at infinity). 

We sce that the fitted surfaces shown in Figures 10a, 10¢c, and 10d 
could provide valuable approximations for the systems discussed in 
§$1 and illustrated in Figures 3, 4, and 5. Figures 10¢ and 10d represent 
essentially limiting cases, for example Figure 10¢ represents the momen- 
tary situation between Figures 10a and 10b when the sign of B,, changes 
from negative to positive. These limiting cases would seldom if ever 
occur exactly in practice. When the underlying surface is like that in 
Figure 4 or in Figure 5 we obtain an attenuated form of one of the 
basic surfaces of Figures 10a and 10b which approaches one of the limiting 
cases. 

In general if one coefficient is small compared with the other in the 
canonical equation a ridge of some kind is indicated. If the center is 
near the center of the design we have the central part of a system which 
is an attenuated form of 10a and 10b. In either case the system will 
approximate to the stationary ridge in Figure 10c. 

If one coefficient is small compared with the other but the center is 
remote from the design then a sloping ridge of the kind illustrated in 
type 10d is usually indicated (although it could happen that the slope 
of the ridge was so slight that the situation was most closely approxi- 
mated by 10c). We cannot, of course, draw any conclusions about 
the nature of the surface at the remote point corresponding to the center. 
We can however use the calculated center as a construction point 
which, together with the information concerning the directions of the 
axes will enable us to determine the approximate nature of the local sur- 
face in the region where it has actually been fitted. Suppose B,, is 
the small coefficient then, since we have already liquidated all large 
first order effects, the axis, Y. corresponding to the coefficient B,, 
would normally be found to pass close to the design. We therefore 
take a new origin S’ on X, close to the center of the design. Referred 
to this new origin the coordinates are Y, and Xj = X, — a. Substituting 
the latter relation in (4) we obtain the equation in the form 


where Y{ is the value of Y at the new origin; 


Ys = + 


it 


38 BIOMETRICS, MARCH 19547 


and 
Bs = 2aB.. 


is the slope of the ridge Y, at S’. 

If Bi is very small therefore, we would have an almost stationary 
ridge like Figure 10c. If (as is usually the case when S is remote) B! is 
not small we have a rising ridge, the gain in response being about equal 
to BS for unit movement up the ridge. We see that in this case we are 
again approximating to the surface by an attenuated form of either of 
the surfaces in Figures 10a or 10b, but the part of the surface used is 
not at the center but at some point along the axis of attenuation. 

In general our analysis of k dimensional second degree fitted surfaces 
will follow the same lines. 


From the 3(/ + 1)(4 + 2) coefficients of the original equation we 
calculate 


(i) The coordinates of the new center , , , %s and the 
value Y', of the response at this point, 
(ii) the canonical form of the equation 


which contains only k coefficients. 

(iii) The k equations of the new coordinates (X’s) in terms of the 
old coordinates (2x’s). 

Figure 11 shows some of the possible three dimensional surfaces 
generated by second degree equations. Thus in figure lla the ellipsoid 
illustrated is intended to represent one of a series fitting one inside the 
other. The centre thus corresponds to a point maximum if the response 
is decreasing on moving away from the centre, or a point minimum in 
the contrary case. Denoting as before by —, 0 or + the values of 
the coefficients (B,, , B.. , Bs;) in the canonical equation the system 


lla is typified (———) for a minimum or (+++) for a maximum. 
We need consider only one of the alternative possibilities for this and 
the other examples. We then have Ila (———), Illb (——0O), 


lle (——+), 11d (—00), Ile (—0+), 11f (—00 centre at infinity), 
llg (——O, centre at infinity). 


SOME EXAMPLES OF SURFACES MET IN PRACTICE 


father more than half the examples so far studied have shown 
marked ridge systems of one form or another whilst most of the remain- 
ing examples have shown factor dependence of a less severe kind. The 


| 
| 
‘| 
| 
i, 
| 
| 
| 
4 
| 
| 
4 
| 
| 
| 
“| 
rf 
tt 


RESPONSE SURFACES 39 


FIGURE 11. CONTOURS OF SOME POSSIBLE SURFACES GENERATED BY A SECOND 
DEGREE EQUATION IN THREE DIMENSIONS 


following three examples of ridge systems were selected because of the 
interesting surfaces they reveal. 


8. An approximate stationary plane ridge in three variables. 


The detailed calculations for this example have been given in [2].* 
It concerns a reaction of the type 


A+B-C followed by A+C-D 


in which two reactants A and B formed a mixture of C and D. The 
object was to obtain the maximum for C subject to the condition that 


the yield of D should not exceed 20% (more than this amount would 
cause difficulty in purification). The quantity of B used was kept 
constant throughout, the factors varied being temperature (7), the 
concentration of A(e¢), time of the reaction (8. Preliminary experi- 


mentation had led to the levels 77) = 167°C, e¢ = 27.5%, t = 6.5 hours. 


The levels of the factors used in the experiments to be described are 


*The “natural units” given as an example in the book differ slightly from those given here. 


(a) X,  (b) 
xX, X 3 
; 
(c) | Xs X; 
X, (d) 
\ 


40 BIOMETRICS, MARCH 1954 


listed below. The initial design consisted of a 2° factorial. It is con- 
venient to regard the levels of x, , x. and x; used in this design as +1 
from which the relationships between standardized variables 2, , 22 , X3 , 
and natural variables 7, c and ¢ follow. 


Factor levels in units of the design: —2 -1 0 1 2 
Factor levels | 7—Temperature (°C) 157 | 162 | 167 | 172 | 177 
in Natural c—Concentration of A (%) 22.51 2 127.5 | 2 | 32.5 
Units. t—Time of Reaction (hours)} 3.5 5 6.5 8 9.5 


The standardized variables x, , x, , x; (that is the factor levels 
scaled in units of the factorial design) may be expressed in terms of the 
natural variables as follows; 


a, = (T — 167)/5 = (C—27.5)/25 2 =(t-—65)/15 (6) 


The first eight trials listed in Table 2 comprise the factorial design, 
On the assumption that a second degree equation provided an adequate 
model, unbiased estimates of the linear and interaction coefficients 
were obtained as follows; 


b, 1.76 = —3.09 
b, 1.19 bis —2.19 
b, = —0.01 bos = —1.21 


Prior information suggested that the experimental error standard 
deviation was about 1% from which the standard errors for the b’s 
would be about 0.4. The three first order effects were small compared 
with the two factor interactions whence it was concluded that a near- 
stationary region had been reached and seven further experiments 
(points 9-15 in Table 2) were added to complete a second order com- 
posite design. Estimates of all the coefficients in the second order 
equation could now be readily calculated. The fitted second degree 
equation thus obtained was 


Y = 57.71 + 1.947, + 0.912, + 1.0727, — 1.5427 
— 0.26r; — 0.6825 — 3.092,7, — 2.192,7; — 1.21227; (7) 


The standard errors for linear and quadratic effects were about 0.3 
and those for interaction effects about 0.4. 

An estimate of the experimental error standard deviation calcu- 
lated from the residual sum of squares and based on five degrees of 


| 
: 
| 
| 
it 


RESPONSE SURFACES 


TABLE 2 
LEVELS OF EXPERIMENTAL VARIABLES AND RESULTS OBTAINED 


Trial a1 | Xo X3 Yield of C 
2 | —1 1 | 53.3 
3 -1 1 | -!1 57.5 
23 Factorial 4 | 1 | 58.8 
5 1 —] —1 60.6 
6 1 | | 1 | 58.0 
7 ] 1 —1 | 58.6 
8 | i | 1 1 | 52.6 
Additional | 9 o | 0 0 56.9 
points to 10 | 2 0 | 0 55.4 
form a 1] —2 | 0 0 16.9 
central 12 0 | 2 0 57.5 
composite 13 0 —2 0 55.0 
design 14 0 | 0 2 58.9 
15 0 0 —2 50.3 
Confirmatory | 16 | 2 | —3 | 0 59.4 
points. 17 | —3 0 61.5 
18 —1.4 | 0.7 59.5 
19 | —1.4 2.6 | 0.7 58.5 


| | at | 


freedom was 1.5. This suggested that the fit of the equation was 
reasonably good. 
The fitted surface was now reduced to canonical form. 
(i) The coordinates x; 5 , %25 , 23s of the center point S of the system 
and the predicted yield Ys at this point were respectively 
Mis = 0.061, 225 = 0.215, 23; = 0.499, and Ys = 58.14 (8) 
(ii) The canonical form of the fitted second equation (6) was 
Y — 58.14 = —3.19N) — 0.07N2 + 0.78N; (9) 
(iii) The new coordinates (XY, , X, , X;) for any point are given in 


terms of the old coordinates (x, , 22 , #3) by equations which may be 
written 


(x, — 0.061) (a, — 0.215) (x; — 0.499) 
X, 0.7511 O.ASS4 0.4443 
0.3066 0.3383 —0.8897 (10) 
X3 0.5848 — 0.8044 —0.1044 


{ 


BIOMETRICS, MARCI (951 


The entries in the rows are the coefficients in the equation which 
express the V's in terms of the 2’s. Thus the first such equation is 


NX, = — 0.061) 


+ — 0.215) + O4443(x, — 0.499) (11) 


Because the transformation is orthogonal the entries in the columns 
of the same table are the coefficients in the equations which express the 
v's in terms of the X’s. Thus x, is expressed in terms of Y, , X, , and 
NX, by the equation 


(x, — 0.061) = O.7511N, + 0.3066X, + 0.58418X, (12) 


Whose coefficients are taken from the first column of the table. 

Inspection of the canonical form (9) of the equation shows that the 
first term is the predominant one. Any movement away from the 
center point S in the direction of the X, axis will lead to a rapid drop 
in vield, but changes in both XY, and X, can be made with considerably 
smaller effects on yield. We note however that the coefficient of X; 
is positive so that there is a suggestion that further gains might be made 
by moving away from the center point S in either direction along this 
axis. Additional experiments (16) to (19) in these directions were 
conducted therefore and the new results added in to the solution 
the newly calculated canonical equation was then 


Y — 59.23 = —3.51N{ — 0.25X3 + 0.24X; 


Bearing in mind that the standard errors of the coefficient in this equa- 
tion are of roughly the same magnitude of those of the quadratic effects 
in the original equation (i.e. about 0.4) we see that the surface approxi- 
mated to that in Figure 11d in which there is a plane of maximum 
passing through the axes X, and X, . Subsequent experiments at 
various points on the plane confirmed that the surface was of this type. 
The plane stationary ridge system to which the surface approximates 
is shown in Figure 12. In this example we see that we have two dimen- 
sional stationary ridges of the type pictured in Figure 4 in all three 
pairs of variables at once. The yield is at its maximum value of nearly 
60°; on the second of the three planes, and on each side of it are shown 
planes on which the vield is about 50%. 

The location of the maximum plane was little changed by the addi- 
tion of the new observations. Points on the plane will be all those for 
Whieh NV, is zero so that the equation of the plane in terms of x, , x , 
and wv, will be obtained by putting equation (11) equal to zero. FEqua- 
tions (6) may now be used to get the equation of the plane in terms of 


\ 
4 
4 
is 
i 
ir 
t 
.| 
| 
hs 
| 
| 
4 
4 
Ved 


RESPONSE SURFACES 


the original variables 7’, ¢, and ¢. We find 
0.15027 + 0.1954e + 0.29621 = 32.76 (13) 


which defines the set of alternative conditions giving approximately 
the same maximum yield. 


CONCENTRATION 
TEMPERATURE 
215 


6 


TIME 


FIGURE 12,. CONTOURS OF PLANE STATIONARY RIDGE SYSTEM TO WHICH SURFACE 
APPROXIMATES 

The situation is seen to be that, over the range considered, the three 
variables temperature, concentration and time are approximately 
“compensating”. For example if we had a process working at one 
point on the ridge and we wished to change one or possibly two of the 
factor levels so that the maximum yield was approximately maintained 
this could be done by changing the remaining variable, or variables so 
that equation (13) was satisfied. Thus if we wished to reduce the 
reaction time, the equation would indicate the higher values of tem- 
perature and/or concentration necessary to compensate the change. 

In Figure 13 these approximate alternatives are shown on a tri- 
coordinate diagram. This is essentially the maximum plane of Figure 
12 with the intersections of the coordinate planes drawn in. 

The detection of ridge systems of this sort is extremely important 
in practice since When such systems exist they provide the experimenter 
with a number of alternative processes he can use, some of which may 
be very much cheaper or more convenient than others. Furthermore 


| 
4 
43 
ve 
9 


BIOMETRICS, MARCI 1954 


225 
CONCENTRATION: ° 
PERCENT 17 
275 
TEMPERATURE : 
DEGREES CENTIGRADE 
30.0 
172 
32.5 / 
35 5.0 6.5 8.0 9.5 


TIME : HOURS 
FIGURE 13, ALTERNATIVE PROCESSES GIVING MAXIMUM YIELD WITH SHADED 
REGION IN WHICH BI-PRODUCT GREATER THAN 20% 

the existence of a ridge allows points to be chosen so that auxiliary 
responses, such as purity, are brought to their most satisfactory levels. 

In the present investigation a second surface was fitted for the amount 
of by-product D which it was desired to maintain below the level of 
20%. The 20% contour line is drawn on the figure, the shaded region is 
where amounts of by-product greater than 20% are produced. By 
choosing a point in the unshaded region the experimenter may obtain 
both maximum yield of C and a satisfactorily low yield of D. 


9. A two dimensional ridge system in five variables. 
e 


Thus far in the paper we have discussed situations in which there 
were only one, two or three factors and the surface could be pictured 
geometrically. The technique of analysis of factor dependence by 
canonical analysis of the fitted equation becomes even more valuable 
when there are more than three factors for it enables the situation to 
be readily appreciated even though it cannot so easily be perceived 
geometrically. The following example is one in which five factors 
were studied and where a preliminary application of the steepest ascent 
procedure had (as it turned out) brought the experimenter close to a 
near-stationary region. 

The process studied occurred in two stages and the factors considered 
were the temperatures and times of reaction at the two stages and the 
concentration of one of the reactants at the first stage. Evidence was 
available which suggested that the times of reaction were best varied 
on a logarithmic scale and it was known that the second reaction time 
needed to be varied over a wide range. 


The table below shows the levels used in the experiments to be de- 
scribed. 


4 
te 
| 
4 
4 
Le 
| 
Tad 
| 
Ae | 
| 
| 
i 
= 
| 


RESPONSE SURFACES 45 


Factor levels in Units of the Design —3 —] | I | 3 
Stage | 
Factor 7;—Temperature (°C) “320 135 
levels 1 ,t:—Time of reaction (hours) 1.25 2:5 | 5 
in \e—Concentration (%) 40 60 | 80 
natural | 
units | T.—Temperature (°C) 2 | 40 55 
2 |t.—Time of reaction (hours) 0.25 | 1.25 | 6.25 


From the above table it will be seen that the standardized variables 
to be used in the fitted equation expressed in terms of the natural 
variables were 


= {2(log — log 5) /(log 2)! + 1, 
=(c—70)/10, +, = (T, — 32.5)/7.5, (14) 


{2(log ¢, — log 1.25)/(log 5)} + 1 


> replicate of a 2” factorial experiment was first performed using the 


levels corresponding to —1l and +1 in the standardized variables. 
The defining contrast characterizing the fractional design was the single 
five factor interaction. The experiments in this design are the first 
16 in Table 3. 

On the assumption that the surface may be adequately represented 
by an equation of second degree, the five first order terms b, , b., +--+ , bs 
and the ten two factor interactions terms b,, , D3, «++ , 0s; may now be 
estimated without bias. 

The values obtained were as follows, 


b, = 3.194 bis = —1.869 boy = —0.169 
b, = 1.456 b,, = 2.081 = —0.769 
b, = 0.981 bi, = —0.506 bs, = —4.019 
by = 3.419 = —0.431 b,, = —O0.744 
b; = 1.419 i, = O78 = 0.141 


It will be seen that, compared with some of the two-factor inter- 
actions, the first order effects are neither so large as to suggest that we 
are so far removed from a near-stationary region that we must move to 
a new locale and start afresh there to build up the second order design, 


‘ 

| 

| 

| 

| 


46 


TABLE 3 
LEVEL OF EXPERIMENPAL VARIABLES AND RESULTS OBTAINED 


BIOMETRICS, MARCIL 1951 


3 | -1 1 | | -1 
6 | 1 
Replicate 7 | -1 1 1 
of 2° 8 | 1 | 1 1 —1 
factorial 9 | -1 | —1 1 
| 10 | —1 1 
12 a| -1 1 | 
| -1 1 £4 
16 I 1 1]; 1 
Additional 17 3 1 
points to form | 18 1 | -3 | -1 | ] 
non-central 19 1 | 
design | 21 | | | 1 
Confirmatory points 
Canonical Variables 
| | | 
| | | | 
Ny (Xs | Trial) | | as | | 
0 0 0 0 0| 22 | 1.23-0.56 0.03 0.69 0.70 
2.0 0 0 23 | 0.77/-0.82, 1.48 1.88 0.77, 
—2 0 0 0 O 24 | 1.69 —0.30,—1.55|-0.50, 0.62! 
0 2 0 0 2 | 2.53 0.64-0.10 1.51). 
0-2 0 0 26 |-0.08-1.75' 0.01,-0.13) 0.27 
0 2 0 O 27 | 0.78 —0.06) 0.47,-0.12) 2.32 
0 0-2 0 0 2 68 —1.06 -0.54 1.50'—0.93 
0 0 0 2 0 | 2.08 -2.05,-0.32) 1.00, 1.63) 
0 0 0-2 0 30 0.38 0.93 0.25 0.38 —0.21 
0 0 0 0 2 31 0.15 -0.38 —1.20 1.76) 1.24) 
0 0 0 0-2 32 | 2.30-0.74) 1.13/-0. 


| 


38, 0.15 


| 61 


Yield 


49.8 
51.2 
50.4 
52.4 
49.2 
67.1 
59.6 
67.9 
59.3 
70.4 
69.6 
64.0 
53.1 


| (Predicted) 


Yield | 


72.3 | 


57. 
53 
62 


64. 
63 
72 


72.0 | 
70 
71.8 | 


Yield 
(71.4) 
(52.6) 
(52.6) 
(61.1) 
(GL.1) 
(66.3) 
(66.3) 
(69.5) 
(69.5) 
(j2.2 


(72.2) 


ip 
= 
| | | 
-1 
| 
| 
—1 | 68.2 
-1 | 58.4 
1 | 64.3 
kB? 1 | 63.0 
1 | 63.8 
1 | 58.5 
1 | 66.8 
3 | 67.4 
| 
| | 
| 
| 
| 
| 
| 


RESPONSE SURFACIS 


nor are they so small that we should expeet the region covered by the 
design to be perfectly centered in the near stationary region which we 
wish to study further. Tn these circumstances it seemed appropriate 
to add the further points necessary to allow estimation of all effects 
up to second degree in a “non-central’”’ composite design of the type 
discussed in §6 and in [1]. The idea is to add points on that “corner” 
of the factorial design which is in the direction of increasing yield. 
Although such a design is not so efficient as the centered composite 
design this type of augmentation will have the effect of shifting the 
center of gravity of the design closer to the center of the near-stationary 
region which we wish to study and thus reduce errors due to “lack of 
fit” of the postulated form of the equation. 

In deciding where best to add the extra points it was noted that the 
highest single vield (70.4% ) was recorded at the point (1, —1, —1, 1, 1) 
and this would therefore seem the best point at which to augment the 
design. If we had ignored the second order effects we would have been 
ied to use the point (1, 1, 1, 1, 1) since all the first order effects are 
positive. We could not judge accurately the modifying effect of the 
two-factor interaction terms unless quadratic terms were available also; 
however in view of the (by no means negligible) size of the interaction 
terms b,, , by; , and ),, and the nature of their signs the point at which 
the highest yield was obtained appeared a reasonable choice and was 
adopted. Five extra points having coordinates (8, —1, —1, 1, 1); 
—1, 1, 3) were therefore added; these are numbered 17-21 in Table 3. 
It will be noted that by using this design we were estimating 21 constants 
in 21 experiments. This would not normally be regarded as a good 
practice since no comparisons remained to test goodness of fit. However 
11 confirmatory points were later added to the design so that some check 
on the fit was possible at a later stage. The least squares estimates of 
the linear and interaction effects are unaffected by the new points for this 
type of arrangement. Estimates of b, , the predicted yield at the center 
of the design, and of the quadratic effects b,, , boo , bss , by, and b;; 
could now be calculated as follows; 


by = 68.173, = —1.454, bs = —1.348, 


bss = —2.723, b,, = —2.261, b.. = —1.036. 


We now have estimates of all the 21 constants in the second degree 
equation and we can carry out the canonical analysis as before. The 
coordinates of the center point S of the system and the predicted yield 


be 
17 
2 


48 BIOMETRICS, MARCI 1954 


at this point are respectively 
Ms = 1.2263, Xs = —0.5611, Iss = —0.0334, 
Mss = 0.6916, X35 = 0.6977, Ys; = 71.38 (15) 
whilst the canonical form of the equation becomes 
Y — 71.388 = —4.70X{ — 2.58X; — 1.25X3 — 0.48Xj + 0.20N} (16) 
The transformation which gives the new X coordinates in terms of 


the old zx coordinate (or by transposition, the old coordinates in terms 
of the new) may be written 


(x;—0.6977) 


x, | -0.2300 | —0.1289 


| 

| | 

(1 1.2263) | (v2 0.5611) | | (x.-0.6916) 
| | 


0.7581 | 0.5953 | 0.0382 | 
X, | 0.6518 | 0.5994 | -0.0343 | 0.4116 | 0.2128 | 
x, —0.2245 0.2489 | 0.2520 , —0.4058 | 0.8120 | (17) 
| 0.4255 | —0.7445 ~0.1437 | 0.1561 | 0.4686 


X; | —0.5392 | 0.0883 —0.5831 | 0.5359 | 0.2726 | 
| 


We see from equation (16) that once again two of the canonical 
factors XY, and X, are playing a comparatively minor role in describing 
the functions and are of a similar order of magnitude to their standard 
errors. At least locally most of the change can be described by the 
canonical variables X, , X,. , X; . 

Further experiments were now performed to determine whether 
these general findings as to the nature of the surface could be confirmed. 
To do this eleven further experiments were carried out, one at S, the 
predicted center of the system, and the others in pairs about S along 
the axes Y, , X,,X;,X,andX,. Each of these points was at a distance 
two units from the center. Thus in terms of new variables X, , X. , 
X; , X, , X; the coordinates of these points were (0, 0, 0, 0, 0); (+2, 
0, 0, 0, 0); (0, +2, 0, 0, 0), ay ai (0, 0, 0, 0, +2). 

The levels of the original variables corresponding to these points 
were calculated from the equation expressing the z’s in terms of the 
X’s, and are the coordinates of the experimental points 22 to 32 in 
Table 3. This table also shows the yield predicted from the fitted 
second degree equation (in brackets) and the values actually found. 
Although the predicted yields are not always in very close agreement 
with those found, it is apparent that the additional experiments confirm 
the overall features of the surface in a rather striking manner. 

It will be seen that the experimental points on either side of S along 
the X, axis are associated with the largest reductions in yield, approxi- 


ii 
| | 
4 
$4 
4 
| 
| 
| 
| 
| 


RESPONSE SURFACES 


SETS O1 


°C 
t; hrs 
c % 
°C 
tg hrs 


t; hrs 
c % 
T2 
ty hrs 


t; hrs 
c % 
hrs 


T, °C 
hrs 
% 
T2 °C 


hrs 


120 
1.9 
55 


48 


— 
or 


| 


TABLE 4 
ey NEARLY ALTERNATIVE CONDITIONS ON PLANE OF X¢ AND Xs 


124 128 132,136 
1.8 oF) ‘as 
“a | | 3% | 32 
26 | 2.1 | 1.7 
121 =| 125 | 129 
2.3 2.3 “da 
62 6s | 74 | 80 
43 | 39 | 35 | 31 
1.8 14) 1. | 0.9 
| | 
118 121 126 130 
3.0 2.9 | 
64 70 | 75 | 81 
42 3s | 34 | 3 
De | 1.0 | 0.8 | 0.6 
| 
3.9 3.8 | ae | 3.5 
6 | 77 
08 | | O05 | 0.4 
| 
m1 | 28 
5.0 | 4.9 4.7 | 4.6 
39 35 31. | 27 
06 O45 0.4, 0.3 
| @ + | 2 
| | | 
xX; —> 


49 

| 
| 
| 

1 
| 

0 Xa 


mately confirming the large coefficient of XY; in the canonical equation. 
Somewhat smaller changes are associated with movement away from 
S along the X, and X, axes and these again are of the order expected. 
Finally movement along the X, and X, axes is associated with only 
minor changes in yield thus confirming that there is an approximately 
stationary plane and that we have wide local choice of almost equally 
satisfactory conditions from which to choose. After including the infor- 
mation from these 11 extra points the canonical form of the second degree 
equation becomes Y — 73.38 = —4.46X{ — 2.64X; — 1.80X; — 0.39X; 


| 
| 
ag 
|_| | 
3:2 
| 
- | 
117 | 
| 
| 2.4 | 
57 
47 
| 2.2 | 
114 
3.1 | 
| 46 | 
4.0 | 
| 
| 59 =] 
| 
45 
1.0 | 
| | 
| 
T, °C 107 | a 
t, hrs | | 
| 13 | | 
| | 
| =9 | 


BIOMETRICS, MARCH 1951 


— O.OOLN which agrees closely with that found before. The transform- 
Ing equations (17) are affected very little. The estimated experimental 
error varkince based on the residual sum of squares having 11 degrees 
of freedom is 1.51; using this value it appears that the standard errors of 
the estimated coefficients are all less than 0.4. The sets of conditions 
which yield approximately the same results are in the plane defined 
by the equations X, = 0, XY, = 0, X, = 0. Using (17) these equations 
are readily transformed to equations in the five unknowns 2, , %2 , 23 , 
x, , and x; and using (14) to equations in the five natural variables. 
We have two “degrees of freedom” in the choice of conditions in the 
sense that within the region considered if we choose values for any two 
of the variables we obtain three equations in three unknowns which 
‘an be solved to give appropriate levels for the remaining three variables. 
To demonstrate the practical implications of these findings to the experi- 
menter, a table of approximate alternatives, which covered the range of 
interest, was calculated as is shown in Table 4. 


10. A falling ridge in cost. 


This third example concerns the improvement of a process already 
in operation in which two solids A and B were fused at high temperature 
to give a third substance C. The object was to reduce, if possible the 
manufacturing cost of C. The amount of B was kept constant through- 
out the experiment and the three factors were studied at the levels 
indicated below. 


Factor levels in units of the factorial design: | —3 =f 1 

Factor levels Fusion Temperature °C (7) 240 245 | 250 
in Natural Fusion Time — (hours) (f) | 32 2- | 16 
Units Molar Ratio of A to B (M) | 3.5 | | 5.5 


The standarized variables x, , x, , “, are thus expressed in terms of 
the natural variables as follows, 


= (1 — 247.5) /2.5, x, = (t — 20)/(-—4), = (M — 5.0)/0.5 (18) 


In the trials which are now described the yield corresponding to each 
set of factor combinations was determined experimentally and the 
expenditure involve in running the process with the factors at these 
levels was calculated. (High temperature and longer times both 


He 
t= 
i 
|. 
| 
|. 
| 
| | | 
| 
7 
1 
4 
4 


RESPONSE SURFACES 51 


involved greater fuel consumption for example, and greater molar 
ratios involved the use of larger amounts of the expensive material 1). 
Using the experimentally determined yield and the calculated expendi- 
ture it was now possible to calculate the cost of manufacturing one 
pound of product at each set of conditions tried. For convenience in 
salculation an arbitrary quantity has been subtracted from the cost 
per pound of product assessed in tenths of a penny. ‘This is the response 
recorded below and we will refer to it subsequently as the “cost”. 
The requirement was to find a minimum on the cost response surface. 


TABLE 5 


Trial | X3 Cost 
| 

2 1 | -1 | 

2? Factorial 4 1 | —1 | | 39 
5 1 | 64 

6 1 | 74 

—1 | | 48 

8 -1 —1 | 18 

Additional points to | 9 | —3 | =) | =] | 90 
form composite | 10 | —1 | —3 —] 52 
design = 16 
Confirmatory | 12 | 0 | 0 | 0 38 
points 13 | —3 —3 | 48 
14 | —1 0 33 


The experiments carried out are listed in Table 5. The first eight 
comprised a 2° factorial design with the standardized variables at the 
levels —1 and +1. Assuming that some second degree equation is 
adequate to express the cost function unbiased estimates of the linear 
terms and two-factor interaction terms may be obtained from the 
eight results as follows, 


b= 1.50, = 8.75 205, 


be = —9.25, by = —2.75, bag = —13.00. 


The experimental error standard deviation was expected to be in 
the neighborhood of eight units so that an estimate of the standard 
errors of these coefficients would be about three units. 


LEVELS OF EXPERIMENTAL VARIABLES AND RESULTS OBTAINED ee 


52 BIOMETRICS, MARCH 1954 


We see that this is again an example where the first order effects are 
neither dominant nor negligible compared with two factor interactions. 
The design was therefore extended to form a non-central composite 


_ design. The factor-combination associated with the smallest cost 


is at the point (—1, —1, —1) and bearing in mind that a minimum 
is being sought, augmenting the design at this apex would seem to be 
supported by the signs of the estimated coefficients. Points were 
added therefore at (—3, —1, —1), (—1, —3, —1) and (—1, —1, —3). 

All first and second order coefficients in the second degree equation 
could now be estimated. Canonical analysis of the fitted equation 
indicated a falling ridge passing near the center of the factorial design 
in the direction of negative levels of the variables. Two further con- 
firming experiments were conducted therefore, one at the center of the 
factorial design (0, 0, 0) and the other in which the levels of all three 
variables were reduced by three units (—3, —3, —3). A further 
experiment (performed in error) was at the point (—1, —1, 0). 

The information from these three additional experiments was now 
included using Plackett’s technique and the coefficients in the equation 
fitted to all 14 points were now as follows, 


Bo = 28.19 
= 1.53 Dio 7.28 = 11.23 
b, = 8.78 bs = — 0.81 bes = 10.85 


b, | bos = — 11.06 bss = 3.11 


The coordinates of the fitted equation and the predicted cost at the 
center were as follows, 


Ms = 3.87 2s = 10.13 X35 = 18.12 ys = 96.61 = (19) 
The canonical form of the equation is 
Y — 96.61 = —0.16X} + 9.49X2 + 15.86X3 (20) 
and the transforming equations 


(z, — 3.87) (x, — 10.13) (x; — 18.12) 


X; 0.1871 0.4897 0.8516 
X2 0.7995 0.4290 — 0.4205 (21) 
X3 0.5695 —0.7599 0.3133 


We notice that although this example is similar to the two previous 
ones in that the canonical equation contains a coefficient small com- 


| 
a 
| 
ae 
a 
42 
pis 
; 


RESPONSE SURFACES 53 


pared with the others, it differs from these in one very important respect. 
Whereas in the other examples the center of the system has been in 
the immediate vicinity of the center of the design here the center of the 
fitted system is remote from the design. (The design extends from 
—3 to 1 in each of the variables x, , x2 , x; , but the center of the fitted 
contour surface is at the point (3.87, 10.13, 18.12). 

We have then the situation in which, 


i) no first order effects, which are very large compared with second 
order effects, occur, 

ii) one of the coefficients in the canonical equation is small com- 
pared with the remainder, 

iii) the center of the fitted system is remote from the center of the 
design 

This, as we have seen, may indicate a sloping ridge. 

The small coefficient —0.16* is associated with the axis XY, , which 
(since b, , b, and b; are not large compared with b,, , bi. , etc.) is expected 
to pass close to the design to form the ‘axis’ of the ridge system. The 
point (which we will call S’) on the X, axis closest to the center of the 
factorial design is at x, = —0.08, 7, = —0.21, x; = 0.14, confirming that 
the axis does indeed pass close to the center of the design. These coordi- 
nates of S’ are found as follows. Using the equations (21) it is readily 
found that the X, coordinate of the point x, = 0, 7, = 0,2; = 0 is 
—21.116. Whence it follows that S’ is at the point X, = —21.116, 
X, = 0, X, = 0. Its position in terms of the x coordinates given above 
may now be calculated using the transforming equations (21) once 
more. The predicted cost at S’ is 27.32. 

Taking S’ as the new center (by writing XY, = X; — 21.116 in equa- 
tion (20)) we obtain the fitted equation referred to the local -origin S’ 
in the form 

Y — 27.32 = +6.56X{ — 0.16X/? + 9.49NX> + 15.86X3 (22) 
and the surface approximates Figure (11g) but with the Y, axis taking 
the place of the Y, axis in that diagram. 

The coefficient 6.56 measures the slope at S’ down the ridge. Thus 
locally each unit which we move down the ridge will be accompanied 
by a reduction of about six units in cost. 

Any point on the ‘ridge axis’ VY, can readily be found from equations 
(21) and (18). In particular the points at which costs of 10, 0, and 
—10 were predicted are as follows, 


*To avoid large rounding errors the more exact value — 0.1554 is used in the subsequent calculations. 


| 


BIOMETRICS, MARCIT 1954 


Temperature Time Molar Ratio | Predicted Cost 
| | | 
(i) 246.1 25.7 4.01 
(ii) 245.5 28.2 3.46 0 2.6 (found) 
(iii) 244.9 30.7 2.91 


In practice the conditions (ii) above which involved a saving of 
about 20 units of cost represented the limit to which the ridge could be 
followed. Beyond this point mechanical difficulties arose in running 
the process. It is of interest to note that the process studied had already 
been investigated thoroughly (using the one factor at a time method). 
This had led to the ridge at which point it had been assumed that no 
further improvement was possible. Experiments which made possible 
the approximate elucidation of the nature of the multifactor dependence 
were able to lead to further marked improvement. 


SOME CONSIDERATIONS ARISING FROM THESE INVESTIGATIONS 
11. Demonstration of Results. 


It is essential to the ultimate success of experimental work that those 
concerned with it, in particular those who must make decisions as a 
result of it, and who may have no special knowledge of statistics or 
mathematics, should clearly appreciate the conclusions that have been 
reached. In a chemicai application, months of effort will often have 
been expended in the collection of the data and many days in computing 
the results. This labor will be wholly or partially wasted unless a 
further effort is expended to demonstrate what has been discovered in 
such a way as to be readily comprehensible. 

Particular ways of demonstrating conclusions have been illustrated 
in the examples, thus the alternative processes available in the example 
of §8 were shown in Figure 13 using tri-linear coordinates. It was also 
possible to show on the same diagram the region in which an acceptably 
small amount of the by-product D was obtained. Again, even though 
in the example §9 we were dealing with five variables and consequently 
the geometrical surface could only be fully expressed in a space of five 
dimensions, yet the implications of the two dimensional ridge system 
were clearly brought out by listing alternative processes over the essen- 
tinl region in Table 

To allow ready appreciation of the possibilities for process improve- 
ment Which exist in a given situation the geometrical method is most 


4. 
‘ | | | | 
| 
4 
| 
Veal! 
1 
| 
| 
| 
{ 
| 
| 


RESPONSE SURFACES 55 
valuable. This is particularly true when more than one response is 
to be studied. Contour representation allows the interrelationships 
between as many as three variables and one or more responses to be 
easily comprehended, and the actual construction of three-dimension 
contour models to represent such surfaces as those shown in Figures 7 
and 12, is often worth while. These models may be constructed in 
various ways. One useful method is to outline the contour surface by 
colored wires. The experimental points may be marked by plastic 
counters on which the observed responses may be written, thus allowing 
the experimental arrangement and the fitted contours to be seen simul- 
taneously. A number of suitably placed wire grids produces a frame- 
work on which to build the model and these do not obstruct the view 
of the contours and experimental points. A photograph of one such 
model showing the fitted surfaces before the addition of confirmatory 
points for the example of §8 is shown below. In order that the experi- 
mental points would show more clearly on the photograph the plastic 
counters have here been replaced by marbles. 

When there are two responses to consider, sets of contours for both 
responses may be shown on the same model or alternatively two models 
may be constructed and the results assessed by viewing them side by 


> 


BO Met Piet 


FIGURE 15. PHOTOGRAPH OF A THREE DIMENSIONAT, MODEL SHOWING THE CON- 
TOURS OF THE APPRONIMATE PLANE SPATIONARY RIDGE SYSTEM CONSTRUCTED 
ON THE BASIS OF TRIALS RUN VP THE POINTS INDICATED BY THE MARBLES. A 
DESCRIPTION OF THIS PARTICULAR EXPERIMENT IS FOUND IN SECTION & THE 
TYPE OF EXPERIMENTAL DESIGN USED IS THAT ILLUSTRATED IN FIGURE 8. 


56 BIOMETRICS, MARCH 1954 


side. With more than three variables, selected three dimensional 
sections or two dimensional sections of the k-dimensional contour 
surface may be constructed to illustrate important features of the total 
surface. 


12. The Analysis of Multifactor Dependence. 


It is well known of course that the existence of interactions between 
continuous variables implies factor dependence of some sort. In the 
past however interpretation of such dependence has not always been 
very satisfactory or helpful because, 

i) the interactions were usually considered piece-meal and not as 
a whole, 

ii) attempts were made to interpret interaction effects ignoring 
the influence of other terms of the same order (in particular two factor 
interactions estimated from two-level factorial experiments have been 
considered in the absence of the corresponding quadratic effects). 

In an experiment (such as that in §9 above) in which there are five 
factors, there are no less than 10 two-factor interactions. Unless most of 
these were negligible it would be almost impossible by individual study 
to appreciate the joint effect of these constants. 

To attempt to interpret two factor interactions without the cor- 
responding quadratic effects in precisely analogous to considering 
covariances without the corresponding variances. If we were told that 
a certain covariance between two variables y, and y, was large (say 
equal to 1000) we would not know whether y, and y, were closely cor- 
related or not. (For example if the variances were each equal to a 
million the dependence between y, and y, would be negligible, if they 
were each equal to 1001, y, and y, would be almost completely cor- 
related.) In a similar way, knowledge of the interaction taken alone, 
without the corresponding quadratic effects will not enable us to appreci- 
ate the nature and importance of the factor dependence. 

It is a deficiency of the three level factorial design so far as the 
present problem is concerned that the quadratic effects are estimated 
with only half the precision (twice the variance) of the interaction 
effects this mean that the second order derivative d°y/(dx,)° at the 
centre of the design is estimated with only one eighth the precision 
of the mixed derivative 6°y/ (02,072). 

It is sometimes found in the analysis of variance of three level 
factorial designs that two-factor interactions are ‘significant’ whereas 
quadratic effects are not. This has led to a supposition that conditions 
frequently occur in which two-factor interactions are important but 


| 
it 
haa 
Saal 
ote 
| 
| 
| 
| 
| 
i 
At 
| 
if 
| 


RESPONSE SURFACES 57 


quadratic effects are unimportant, which apparently conflicts with the 
common sense view that for a smooth surface effects of the same order 
ought to be of equal importance. That this contradiction is apparent 
rather than real can be seen if we remember that the expected values 
of the mean squares in the analysis of variance are of the form 
2 

V(b)/o° 
It will be noted that the second term in (23), which will cause the mean 
square to be inflated when real effects occur, is a function not only of 
the size of the real effect 8 but also of the variance V(b) of the estimate 
of this quantity. Thus if real quadratic and interaction derivatives of 
equal magnitude occurred the inflation of the mean square for the inter- 
action effect would be eight times as large on the average as the inflation 
of the mean squares for the quadratic effect. 

It is of course extremely important that before attempting its 
interpretation we should ensure that the fitted equation is meaningful. 
This can be done by considering the magnitude of their standard errors 
of the coefficients and a convenient ‘portmanteau test’ is provided 
by the analysis of variance. 

The whole sum of squares due to regression may be divided into 
components ‘due to mean’, ‘due to all first order effects’, ‘due to all 
second order effects’, and so on. Where there is some doubt as to whether 
a certain variable x; has any influence at all, or any effect other than a 
linear one, mean squares for ‘all effects involving the 7th factor’, or ‘all 
second order effects involving the 7th factor’ can be computed. 

There would however seem to be little merit and some danger in 
the practise sometimes adopted of testing coefficients individually and 
dropping those which were ‘not significant’ (that is could not be de- 
monstrated to be different from zero). By so doing we would be replac- 
ing an unbiased estimate of smallest variance by an estimate (zero) 
which had neither of these qualities. 


13. Quantitative and Qualitative Factors 


The techniques discussed have been for the attainment of a maximum 
when the factors studied were quantitative like “temperature’’, “time’’, 
“concentration”, not qualitative like “type of reactant” or “operator 
performing experiment”’. 

The statistical designs chiefly used heretofore have been the eom- 
plete and fractional factorials, the complete and incomplete block 
designs, latin squares and lattices. The analysis of results from experi- 


58 BIOMETRICS, MARCH 1954 


ments using these designs has usually been aimed only at estimating 
the effects, calculating confidence intervals, and testing significance 
by means of the analysis of variance technique. The formal application 
of such designs, and much of the analysis of the results, is identical 
whether the factors are qualitative or quantitative. It is perhaps this 
circumstance that has sometimes led to insufficient distinction being 
drawn between these two very different types of ‘factors’ and to a 
consequent idea that a method of experimentation ought to cope 
equally with both quantitative and qualitative factors. 

When some of the factors are qualitative variables it is the writer’s 
belief that there is no way of finding the absolute optimum other than 
carrying out separate investigations for each qualitative ‘factor’ com- 
bination unless some very specialized prior assumptions can be made 
about the qualitative factors. 

If the experimental designs were so comprehensive that the whole 
experimental region for the quantitative variables were explored for 
each qualitative variable, this would amount to the same thing as 
carrying out a separate investigation for each variable. If less than 
the whole experimental region were included a design which involved 
qualitative factors with quantitative factors would seem only to provide 
a means for testing whether the response surfaces in the qualitative 
factors were similar or not. Suppose for example that the experimenter 
was attempting to find the optimum combination of three ‘factors’, 
two of which ‘temperature’ (7') and ‘concentration’ (c) were quantitative, 
and the remaining one ‘type’ of reactant used (A or B) was qualitative. 
We can imagine the temperature, concentration, yield-contour diagrams 
for reactants A and B in a particular temperature and concentration 
region being like those shown in Figure 14 so that, for example, a 2° 
factorial experiment performed for the three factors can be imagined to 
have its experimental points embedded in the contour diagrams in the 
manner illustrated. 

i) If the mechanism of the reaction with A was different from that 
with B (as would usually happen if A and B were essentially different 
substances) then the two temperature-concentration yield surfaces would 
also be different (as are those illustrated). Large interactions between 
the temperature-concentration effects and reagent effects would now 
be found. The only conclusion the experimenter could reach would 
be that if both reactants were to be seriously considered the surface 
for each would have to be examined separately. 

ii) If A and B were essentially the same substance but were perhaps 
from different batches of material it could well happen that the two 


| 
| 
hoa 
| 
} 
ij 
2 
3 \ 


RESPONSE SURFACES 59 


response surfaces were identical apart perhaps from some differences in 
mean level. The interaction terms involving the type of reactant 
would then all be small and we would draw the conclusion that at least 
locally the yield surfaces were similar. This might lead to the tentative 
assumption that further experiments could be conducted with only 


REACTANT A 
TI 
i 
REACTANT B _ 


rature 


Temp* 
FIGURE 14, THREE FACTOR SYSTEM TWO QUANTITATIVE FACTORS 
ONE QUALITATIVE FACTOR 
one reactant. It will be noted that the circumstances here are rather 
specialized. 

iii) A further possibility might occasionally be worth considering. 
This is that the surface for reactant A is the same as that for reactant B 
but displaced. For example, if the design was such as to allow the 
fitting of a second degree temperature-concentration response surface 
at each level of the reactant then displacement would be indicated if the 
second order effects were nearly equal but the linear effects were different. 

It will be noted in particular that in (i) if all the yields from trials 
using A were higher than those using B the experimenter should not 
conclude that in subsequent experiments only A need be considered for 
it might be that the experiments had been conducted near the optimum 
levels of temperature and concentration with A, but that a higher 
maximum was attainable using B in some other region. 

In the writer’s opinion, the type of experiment described above 
(limited as its purpose is to test the hypothesis that the yield surface 
at the two levels of the reactant are similar against the alternative 
hypothesis that they are different) would often not need to be performed. 
In particular, the experimenter would usually know in advance whether 
a qualitative factor was likely to behave like that in (i) or (ii). 

I am indebted to Mr. J. S. Hunter and to Dr. R. J. Hader for assist- 
ance in the preparation of this paper. 


i 
= 
| 


60 BIOMETRICS, MARCH 19514 


REFERENCES 

(1) Box, GE. P., and Wilson, KX. Bo On the Experimental Attainment of Optimum 
Conditions, J. R. Stat. Soc., B, 13: 1, W951. 

(2) Lhe Design and Analysis of Industrial Experiments. G. 1. P. Box, L. R. Connor, 
W. R. Cousins, O. L. Davies, F. R. Himsworth and G. P. Sillitto, edited by 
O. L. Davies, Edinburgh: Oliver and Boyd, 1953. 

(3) Thurstone, L. L. Multiple Factor Analysis. University of Chicago Press, 
Chicago. 1948. 

(4) Plackett, R. L. Some Theorems on Least Squares. Biometrika 36: 458, 1949. 

(5) Box, G. E. P. and Hunter, J.8. Technical Report No. 5 prepared under Ordnance 
Contract DA/36/034/ORD/1177. 

(6) Booth, A.D. An Application of the Method of Steepest Descents to the Solution 
of Systems of Non-Linear Simultaneous Equations. Quart. J. Appl. Math., 1: 
237, 1949. 

(7) Koshal, R. S. Application of the Method of Maximum Likelihood to the Im- 
provement of Curves Fitted by the Method of Moments. J. R. Stat. Soc. 96: 
303-13, 1933. 

(8) Box, G. E. P. Multi-Factor Designs on First Order. Biometrika 39: 49, 1952. 

(9) Box, G. E. P. and Hunter, J. 8. Study and Exploitation of Response Surfaces. 
Paper read before East North American Regional Meeting of Biometric Society, 
December 1953. 


4 
ie 
“| 
| 
ve 
te 
| 
|. 
|. 
| 
\ 
j 


DOUBLY BALANCED INCOMPLETE BLOCK DESIGNS 
FOR EXPERIMENTS IN WHICH 
THE TREATMENT EFFECTS ARE CORRELATED 


D. Catvin* 
Institute of Statistics, North Carolina State College. 
Raleigh, North Carolina 


INTRODUCTION 


The balanced incomplete block designs, which were introduced by 
Yates (1936), have found many applications in organoleptic testing 
(Boggs, 1951; Calvin, 1950; Galinat and Everett, 1949; Greenwood, 
et al, 1951; Hopkins, 1953; MacLean and Wickens, 1951; Smith, et al, 
1949; Smith, et al, 1950). It is the general concensus of food research 
workers that, for most foods, a taster cannot differentiate among more 
than five to eight samples at one sitting and perhaps not more than 
four or five (Bengtsson and Helm, 1946; U.S. Bureau of Human Nu- 
trition and Home Economics, 1951; Crocker, 1945). This limitation is 
caused primarily by the inability of the judge to taste more than a 
few samples without becoming fatigued. The senses become dulled 
rather rapidly and wrong impressions are obtained if one attempts to 
taste too many samples. Therefore, when there are more treatments 
than can be tested at one sitting, an incomplete block design seems the 
immediate answer (Anderson and Bancroft, 1952). The analysis of 
such designs is simple and is described in detail by Cochran and Cox 
(1950). 

The balanced incomplete block designs were originally devised to 
allow for small block size when equal precision was desired on all treat- 
ment comparisons. Yates, in his original paper, pointed out two 
important situations when small block sizes are desired. The first is 
encountered when the number of treatments is so large that the amount 
of material needed for a complete block is very heterogeneous. By 
using small blocks, the material within each block can be more homo- 
geneous, giving more efficient estimates of treatment comparisons. 
The second situation when small blocks are desired is when the number 


*Present address: Agricultural Experiment Station, Oregon State Coilege, Corvailis, Oregon. 


61 


ic Ag 
i¢ 
i 
#e 


(2 BIOMETRICS, MARCH 1951 
of possible units per bloek is less than the total number of treatments. 
This second situation is not entirely divorced from the first, because, 
by combining the units from several blocks a complete block can usually 
be made, only however, at the cost of introducing greater heterogeneity. 
Looking at the second situation from this point of view, the primary 
difference in the two situations is the difference in knowledge of the 
nature of the heterogeneity of the material used. 

Organoleptic experiments fall between the two situations as described. 
Experimenters are willing to acknowledge that there is a block size 
for most foods beyond which fatigue of the taster or judge causes the 
heterogeneity to be so great as to make any comparisons practically 
useless. As to just what the optimum block size is, no general answer 
can be given. As Crocker (1945) says, it depends upon the foods 
being tested. It is small enough in many cases, however, to require 
some type of incomplete block design. 

One of the assumptions underlying the analysis of variance is that 
there be no correlation among the observations within a block, other 
than that introduced by block and treatment effects (Eisenhart, 1947). 
In most cases where incomplete blocks are used, this assumption appears 
valid or nearly so or is accounted for by randomization. In tasting 
experiments, in which scores are assigned to the samples, there is indica- 
tion that this assumption is not always met (Boggs, 1951; Dove, 1943; 
Hanson, et al, 1951; Harrison and Elder, 1950; Hopkins, 1950). The 
score or rating that a particular sample receives is dependent upon the 
relative ratings of the other samples in the block. As well as attempting 
to give some objective measure of his preference or the amount of 
flavor, the judge also makes a comparison with other samples in the 
block. 

This effect, of dependence on or correlation with other samples in 
the same block, invalidates one of the assumptions underlying the 
analysis. If possible, one would like a method of removing or accounting 
for this effect. One possible method would be a transformation of the 
scores to remove the correlation. A number of transformations have 
been tried by the author, all without appreciable success. 

The basic problem seems to lie with the model. No allowance for 
correlation within the block is taken in the usual model used for the 
analysis of the balanced incomplete block designs. If a suitable model 
could be formulated which would account for the correlation effect, 
then an appropriate analysis of incomplete block experiments could be 
made. 

This problem has heretofore been attacked only by non-parametric 
methods. Nowhere in the literature does there seem to be a considera- 


We 
f 
{ot 
| 43 
= 
| 
{: ~ 
| 
Tl 


INCOMPLETE BLOCK DESIGNS 63 


tion of extension of the parametric model. An explanation of this may 
be threefold: (i) ranking of the samples avoids any distribution prob- 
lems, (ii) ranking methods may usually be used in analysis whether 
ranking or scoring was used in testing, and (ili) the statisticians working 
in this field are interested in non-parametric methods in general. 

The rank correlation methods, developed primarily by Kendall 
(1948), have been extended by Durbin (1951) to cover the analysis of 
incomplete block experiments in which the objects in each block have 
been ranked. A measure of the general concordance of the rankings is 
given by the coefficient of concordance, and a test of independence of 
the rankings is afforded. The procedure is not difficult to follow; 
however, no provision is made for tied ranks. 

Bradley and Terry (1952) have developed a rank analysis of in- 
complete block designs when block size is two. They assume that when 
treatment 7 appears with treatment j in a block, the probability that 
treatment 7 obtains top rating is 7;/(m; + 7;), where every 


and = 1. 


Likelihood ratio tests of the null hypothesis, 
H,: x, = 1/t @=1,---,2d, 
against the alternative hypotheses, 
H,: (h 


~ 
we 


where = 0, 8 = 


and 


(s, — S-:)r(h) = 1, 

are set up and tables provided for evaluating the probabilities associated 
with two of these tests. The alternative hypotheses associated with 
these two special tests are: (i) the treatment ratings are all unequal, 
i.e. m = t, and (ii) there are only two groups of treatments, which may 
themselves have different ratings, ic. m = 2, but within which the 
treatments do not differ in ratings. Other alternative hypotheses 
involving only a subset of parameters do not lead to parameter-free 
tests. 

Until the analysis can be extended to block sizes greater than two, 
it is doubtful if the analysis will satisfy most requirements. The authors 
indicate that work is progressing in this direction but to date they 
have not found a satisfactory mathematical model. 


t 


64 BIOMETRICS, MARCIE 1954 


These two papers are apparently the only published attempts at 
analysis of the incomplete block designs which might prove useful 
when there is a lack of independence among scores of samples in the 
same block. By transforming the scores to ranks, either of the methods 
referred to may be used, with certain limitations, viz. the loss of in- 
formation due to forced ranking and no allowance for ties in Durbin’s 
method, and the necessity of blocks consisting of paired comparisons 
in Bradley and Terry’s method. Both analyses suffer some loss of 
efficiency compared to an appropriate parametric analysis because of 
the restriction of equal intervals between scores implied by the use of 
ranks rather than scores. 

For these reasons, it would be desirable to develop an analysis for an 
appropriate parametric model. 


GENERAL LEAST SQUARES ANALYSIS OF INCOMPLETE BLOCK DESIGNS 
ASSUMING CORRELATION MODEL 


Mathematical Model. 


The mathematical model which is proposed is: 


Yar = + Be + + ji; + Eni) (la) 


h=1,2,---,@q; i,j =1,2,---,p, 


where Y,,; is the observed value for the 7th treatment in the hth block; 
u, 8, and 7; are effects for the mean, the Ath incomplete block and the 
ith treatment respectively; a,;; is the effect common to the ith and 
jth treatments when they are both in the same block (a;; = a;;); 
€,; 1s the residual or error, assumed to be normally and independently 
distributed with mean zero and variance, 0°; 


= when the zth treatment is in the hth block, 
mae 0 when the ith treatment is not in the Ath block; 


1 when iz < J, 
—1 wheni > j. 


Replacing the parameters by their estimates, the model becomes 


For a least squares analysis, fixed treatment and block effects will 
be assumed, which imply the restrictions 


b, = 0, t; = 0, and => 0. (2) 
A ‘ 


ivi 


2 
" 
| 
| 
i 
+. 
id 
| 
i 
— 
| 
‘ 


INCOMPLETE BLOCK DESIGNS 65 


This model assumes a type of correlation which is somewhat more 
general than may actually oceur. The actual situation is probably to 
spread the scores for treatment samples with true ratings not very 
different and pull together scores for samples with ratings far apart. 
Model (la) allows for this effect but also allows for a more general 
correlation pattern. 

The analysis for the ordinary balanced incomplete block experiments 
is simplified with the following conditions in design: 


(i) The blocks shall be of constant size, k, i.e. bie Ny. = Kk: 
(ii) All p treatments shall be tested an equal number of times, 
(iii) No treatment shall occur more than once in any block, i.e. 
= 1 or 0. 
(iv) Each pair of treatments shall occur together in the ¢ blocks an 
equal number of times, \, ic. Do, mia; = 2. 
Condition (iv) is the usual condition for a balanced design. These 
conditions will also be imposed on the design for model (la). It will 
be evident from the normal equations that one further restriction on 
the experimental design will aid in simplifying the analysis of data 
based on this model. 


Least Squares Analysis. 


By minimizing the sum of squares of the residuals in model (1b) the 
resulting normal equations are: 


mirpm =G 
b, : km + nit; + kb, = B, 
(3) 


rm +rt; + + Ny im; = T; 
h 
a;;: Mt; — t) + 2a; + — = Ag 
ah 
where 
h i 
(4) 


T, = Yai ’ 

h 
A 


. 
ai 
4 
g 


BIOMETRICS, MARCI 1954 


ay @ XZ Q = 0 0 3 
tly O Q Q XY 0 O X 0 
tly 0 O Q@— XZ O Y— YX 0 Sly 
ig fig Shy ¥ 


NOILVNOa 


NOISSHUDAY AHL NI‘ ‘STVAGISaU AHL AO SAUVNOS JO WAS AHL ONIZININIW WOUd ONIL TASTY SNOILLVNO"d ‘IVNYON AHL 


I 


| 
he 
= 
4 
| 
|. 
5 
i 
| 
| 
4 
|. 
i 
| 


INCOMPLETE BLOCK DESIGNS 67 


The b, equation contains no correlation effects because each block 
total includes both a;; and —a;; (= —a,;) for all combinations of the 
treatments in the block taken two at a time. 

The a;; equation is simplified if 


Dd 
h 


ie. if each triplet of treatments occurs together in a block 6 times. 

When this condition is satisfied by the design, the a;; equation becomes 

= Ti) + 2da;; + — mM; = Aj;. (5) 

Because the restriction on triplets of treatments is the second 
condition for balance in the design, designs which have balance both 
for pairs and triplets will be called doubly balanced incomplete block 
designs. 

In Table 1 is given the matrix representing the normal equations 
for a general doubly balanced incomplete block design. Because 
a;; = a;; , only the a;; with 7 < 7 have been included in the normal 
equations and hence in the analysis. 

By substracting \ >>; m,,a;; = 0 from each ¢ equation, all the a 
effects drop out of the ¢ equations. Then by eliminating the b coefficients 
from the ¢ equations, the adjusted ¢; equation becomes 


1 


where 
B,, = = =. Nr iBr (7) 
h i h 


i.e. B,, is the sum of all block totals which contain the 7th treatment. 
By adding \/k >>, t; = 0 to the ¢; equation, the adjusted equation 
becomes 


r, 1 = 
(8) 
The coefficient of ¢; can be expressed as 


rE, (9) 


where E is the “efficiency” factor of the balanced incomplete block 


| 


d), it = (+)b,, 
‘aatnbs JO = (vlunbg 


BIOMMTPRICS, MARCIE 1951 


| | 
a | | 
() — + 29 (@ — = oss — 4)(1 — 4) efoto, ) 
— 4/49 = | LUN) 


— 4 4 = ,lig — du 


,(vdunby uvopy) 9 Jo | JO UOLPULIE 
| 


(QD IMCOIN SISVIVNY 


OS 


| | 
| 
| 
| 
| | 
| 
| 
a 
| 
| 
|! 
| | 
ae | 
% 
| 
| | 
be 
~ 
hs 
‘ 
| | | 
| 


INCOMPLETE BLOCKX DESIGNS 


design and is equal to 


Ap _ p(k — 1) _ 1—1/k 
re — 1) 1—1/p (10) 
Equation (8) yields the value of ¢; as 
_ 
i; = (11) 


This result is the same as that obtained in the balanced incomplete 
block analysis. 


By subtracting 6 m;,a;, = 0 and adding 6 m;,a;, = 0 
to each a,; equation, the coefficients of all a’s except a;; are reduced to 
zero. The a,;; equation then becomes 


2(r 5)a;; = Ai; INCE t;) 


k 
= Ai; = Ci; ’ (12) 
so that 
~ 2A — 6) 


a3; (13) 


Analysis of Variance. 


The general form of the analysis of variance table for doubly bal- 
anced incomplete block designs is given in Table 2. The design is set 
up with r replications of p treatments in q blocks of k treatments. 
Fach pair of treatments is together in a block \ times and each triplet 
of treatments 6 times. 

The expressions for the sums of squares are in a form which give the 
easiest computing methods. In (8) Q; is defined as 


Q: T; Bu 
and in (12), C,; is defined as 


Ci; = Aj; p (Q; Q;). 


G, B, , T; and A,; are given in (4) and B,, in (7) in terms of the obser- 
vations. 
The sum of squares for blocks, unadjusted for treatment effects, is 


SSB = >> a) = (14) 
h 


h 


| 
‘ 


_¢. 


70 . BIOMETRICS, MARCH 1954 


The sum of squares for treatments, adjusted for blocks, is 
2 
sst = = (15) 


because of (11). Computationally it is easier to obtain kQ; than Q; ; 
hence, the sum of squares allows for this in Table 2. It should be noted 
that SST is also clear of correlation effects; hence, this treatment mean 
square can be used to test for adjusted treatment effects. 

The sum of squares for correlations, adjusted for treatments, is 


2 
=F Deals = (16) 


because of (13). It is easier to compute pC;; than C;; ; therefore, the 
sum of squares allows for this in Table 2. SSC is also clear of block 
effects. 

The sum of squares for error is obtained by subtracting the sums of 
squares for blocks (unadjusted), treatments (adjusted) and correla- 
tions (adjusted) from the total sum of squares. 

It is seen from Table 2 that the one-tailed F test is appropriate to 
test 


and 
Hy : a2 = = = a-1,5 = 0, 
with o” estimated by the error mean square, s”. 
Adjusted Treatment Means and Variances. 
The adjusted treatment mean for the ith treatment is 


The variance of ¢{ , the adjusted treatment mean, is 
o°(ti) = o°(t,) + o°(m), (18) 


since the covariance of ¢t; and m is zero. The variance of t,; is obtained 
from 


rE rE rh 


(19) 


: 
| 
q 
4 
Ho: = = 0 
| 
4 


INCOMPLETE BLOCK DESIGNS 71 


From this, 


o(t) = = (20) 
The variance of the mean is 
a(m) = (21) 
so that 
a(t) = = Net + = (22) 
from (18). 


The variance of the difference between two adjusted treatment 
means is 


o (th — = o (tl) + o (ti) — 2o(tit}) 


2k — , 2a” 


The treatment means and variances given in (17) to (23) are identical 
to those obtained with a balanced incomplete block design. 

In addition to the treatment means, it may be desirable on some 
occasions to obtain a correlation effect, adjusted for treatments. This 
is given in (13). In terms of the observations, 


1p ) 
~ 2A — 6) — 6) 


aj; 


so that 


1 


k k 1 
C:; 2(r = Ai; -T, + -T; + -B,, 
Pp P Pp P 


II 


h 
k k 


| 
1 2 1 2 $ 

2 
1 r+ 7. (k ir o 

2 

|| 


a2 


BIOMETRICS, MARCHE 1954 
) 2 ms(V,, — 


h 


h 
+! 
Por tsi 


The variance of C,; is, therefore, 


But A(p — 1) = r(k — 1) and therefore 


p-il 
Substituting this result into (25), . 


(26) 
Pp 
The variance of an adjusted correlation effect is 
= ae (Ca) = — 
_ Mp 

= — (27) 


Comparison with Complete Blocks. 


The complete block design may be considered as a special case of 
an incomplete block design in which p = k andr = q = AX = 6. In 
this case the unadjusted correlation effects are completely confounded 
With treatments so that no estimation of them is available. When 


does 
p 
/}. 
| 


INCOMPLETE BLOCK DESIGNS 73 


complete blocks are run, there is no difficulty in the analysis as the 
treatment and error sums of squares in the randomized blocks analysis 
are unbiased and lead to exact tests of significance and confidence 
limits under the assumed correlation pattern. 

With organoleptic testing, as in many other types of experimenta- 
tion, the error variance increases with increasing block size beyond an 
optimum block size. As was suggested in the Introduction, the optimum 
block size for testing most foods is from four to eight, after which the 
error variance may increase very rapidly. If more than four to eight 
treatments are to be tested, a complete block design may give an error 
variance many times that which would be obtained in an incomplete 
block design with k = 3 to 5. If this is the case, the experimenter will 
save considerably in material and effort by using the doubly balanced 
incomplete block design, although the time required for analysis may 
be slightly longer. 

As in the balanced incomplete block design, the variance of the 
difference between two treatment means, from (23), is 20;/rE where 
a, is the error variance with blocks of size k. The variance of the diff- 
erence between two treatment means in an ordinary randomized block 
experiment with r replications is 20;/r. The ratio of these two variances 
is 

_ _ Pk 1) 9, 


and is a measure of the relative efficiency of the two designs. Yates 
(1936) has called the fraction 


the efficiency factor of the incomplete block design since it measures the 
loss of information when there is no change in error variance due to 
reduced block size. Because of the large values which o;/o; may take 
on in organoleptic testing, the efficiency factor does not have as much 
importance as in, say, agriculture field experiments. 

There obviously can be no cc:parison of efficiency of the incomplete 
and complete block designs with respect to correlation effects since the 
correlation effects cannot be estimated in complete block designs. 
However it is illuminating to examine the expectation of the mean 
square for correlations to see if the design can affect the variation taken 
out by the correlation effects. The mean square is directly proportional 
to 2(A — 6) so that by selecting a design with (A — 6) as large as is 


4 
LP 
( k 1) 
Rt 
We 


74 BIOMETRICS, MARCH 1954 


practically possible, increased precision should be obtained. As can 
be seen from (32), (A — 4) is proportional to (p — k)/(p — 2), so that 
for a given number of treatments p, (A — 4) is larger as block size k 
is smaller. This indicates that block size of two* is most efficient, in 
this restricted sense, if the correlation effect is an important source of 
variation. 

Efficiency in this restricted sense presumes that the correlation 
effects remain the same with different block sizes. The correlation 
effects may actually decrease with increasing block size so that block 
size of two may not be better than block size of three, four, or even 
larger. Some empirical evidence is needed on this point. 

It should be noted that a block size of two cannot be used to evaluate 
correlation effects unless the design is repeated at least once. With 
only one replication of the design, \ = 1 when k = 2, thereby leaving 
no degrees of freedom for error. 

This discussion on efficiency has ignored any differences in degrees 
of freedom. If the degrees of freedom are less than 20, an adjustment | 
in the efficiency should be made by the method described by Cochran 
and Cox (1950). 


NUMERICAL EXAMPLE 


Unfortunately none of the experiments which best illustrate the 
correlation effect has used doubly balanced incomplete block designs. 
There were, however, a number of ice cream experiments (Calvin, 1950) 
using all possible combinations of treatments (therefore, in doubly 
balanced incomplete block designs) conducted in cooperation with the 
Dairy Manufacturing Department at North Carolina State College. 
In the example which will be used, six different amounts of vanilla 
constituted the six treatments, and tasters were asked to score the 
samples from 0 to 5 in blocks of three as to the amount of vanilla they 
thought to be present in each sample. A score of 0 indicated no vanilla 
and 5 the highest amount. Each taster constituted a separate block, 
no taster being used more than once in the experiment. 

There were two reasons why these data would not be expected to 
show as much correlation within blocks as in other experiments on, 
say, preference. The first reason was that the scoring range was so 
limited, not actually allowing much, if any, added range above treat- 
ment effects. The second reason is that the taste of vanilla is a familiar 
one, even more so in this experiment because the tasters had been 


*Block size of two is a special case of doubly balanced incomplete block designs in which 6 = 0, 
i.e. all triplets are present an equal number of times, zero. 


| 
L 
4 
lea 
le 
|! 
Pa 
|, 
{ 
q 
4 


INCOMPLETE BLOCK DESIGNS 75 


tasting different amounts in previous tests. Because of this familiarity, 
the treatment consisting of no vanilla was recognized fairly easily, 
reducing the effective number of treatments from six to five and the 
number per block from four to three in ten of the fifteen blocks. 

The symbols used are: 


p = 6 = number of treatments, 

k = 4 = number of samples per block, 

q = 15 = number of blocks or tasters, 

r = 10 = number of replications, 

} = 6 = number of times each pair of treatments are in the 
same block, 

6 = 3 = number of times each triplet of treatments are in the 
same block, 

E = 0.9 = efficiency factor. 


The experimental design and scores for amount of vanilla in ice 
cream are given in Table 3. 


TABLE 3 
SCORES FOR AMOUNT OF VANILLA IN ICE CREAM 


Treatments 
Block B; 
1 2 3 4 5 6 
1 2 4 0 3 9 
2 0 4 3 5 12 
3 0 3 4 5 12 
4 1 3 4 5 13 
5 0 2 4 5 ll 
6 0 1 4 5 10 
7 2 3 1 4 10 
8 0 0 3 4 7 
9 0 2 3 3 8 
10 2 4 1 4 ll 
ll 0 3 1 6 
12 1 1 2 4 8 
13 2 3 5 4 14 
14 1 4 5 3 12 
15 4 3 1 2 10 
T; yf 21 22 31 34 38 153 
Bi, 103 107 96 97 106 103 612 
4Q; —75 —23 —8 aT 30 49 0 
40;/36 -2.08 -—0.22 0.75 0.83 1.36 0 


Adj. mean 0.5 1.9 2.3 3.3 3.4 3.9 2.55 


| 
— 
: 


76 BIOMETRICS, MARCH. 1954 


The steps in the analysis are: 


1. Find the block totals, B,, the treatment totals, 7; , and the 
grand total, G. The computations are easiest when in the form of 
Table 3. 

2. For each treatment, calculate 

B,, = total of all blocks in which the 7th treatment occurs, 


LQ, = kT, — B,, = 47; — B,., 
Q,  kQ, _ 


rE 36" 
The total of the B,, should be /G and the total of the kQ; and the 
kQ, Xp should each be zero. 

3. Form a table of all pairs of treatments, such as Table 4. Find 
A,, for each pair, where A,; is the difference between scores of the 7th 
and jth treatments summed over the blocks in which both treatments 
occur. A,; is easy to compute from Table 3 by going down the table 
and entering on the left side of a calculator the score for the 7th treatment 
and on the right side the score for the jth treatment for all blocks in 
which both treatments occur. 


TABLE 4 
SUMS OF SCORE DIFFERENCES FOR ALL PAIRS OF TREATMENTS 


Treatment pairs | 


As 6Ci; Ci; 
1—2 —32 | —5.3 
1-3 — 8 19 3.2 
—12 30 5.0 
1-5 3 0.5 
| —24 —20 —3.3 
2-3 | 1 21 3.5 
2-5 —14 ~31 ~5.2 
2-6 —15 —18 —3.0 
3-4 | =~ § 5 | 0.8 
3-5 - 3 26 4.3 
3-6 — 9 1.5 
oe 2 15 | 2.5 
| | 13 | 2.2 


ne 
- 
ain 
; 
“ft 
| 
| 


INCOMPLETE BLOCK DESIGNS 


| 


4. For each pair of treatments, calculate 
pC; = (pAii) — (kQ,) + (kQ;) = — AQ) + (4Q,). 


For any ith treatment 


Ci; wi: = 0, 


l<i 


e.g. for treatment 2, 


(21) + (—4) + (—31) + (—18) — (—32) = 0. 
5. Compute the sums of squares as: 
Total SS 


2?4 4° +4 --- + 2’ — 1537/60 = 547 — 390.15 = 156.85. 
Block SS(unadjusted) 


= (Bi + Br + + Bis)/k — G*/rp 
= (9° — 12’ + --- + 10°)/4 — 153’/60 = 408.25 — 390.15 


= 18.10. 
Treatment SS (adjusted for blocks) 
= [(kQ,)* + + +++ + 
= ((—75)? + (—23)? + --- + (49)?]/144 = 10,248/144 = 71.17. 


Correlation SS (adjusted for treatments) 

= + (pCis)? + + — 8) 

= [(—32)? + (19)? + --- + (13)’]/216 = 5868/216 = 27.17. 
Error SS = Total SS—Block SS—Treatment SS—Correlation SS 

= 156.85 — 18.10 — 71.17 — 27.17 = 40.41. 


The analysis of variance is as follows: 


Source of variation d.f. S.S. M.S. 
Total 59 156.85 
Blocks (unadj.) 14 18.10 -— 
Treatments (adj.) 5 14.23 
Correlations (adj.) 10 27:17 212 
Error 30 40.41 1.35 


5 


— 


78 BIOMETRICS, MARCI 1954 


6. To adjust the treatment means, calculate 


The adjusted treatment means are given in Table 3. 
7. The standard error of the difference between two adjusted means is 


(21.85) 
= = 0.55. 


8. The C,; are proportional to the adjusted correlation effects and 
hence may be compared directly. The standard error of any C;; is 


given by 
— _ [(24)0.35) _ 


CONSTRUCTION OF DOUBLY BALANCED INCOMPLETE BLOCK DESIGNS 


A large number of balanced incomplete block designs have been 
enumerated in the literature (Bose, 1939, 1943; Cox, 1940; Cochran 
and Cox, 1950; Fisher, 1940, 1942; Fisher and Yates, 1948; Yates, 
1936). It would seem likely that some of them would possess the 
property of double balance. It is obvious that designs which include 
all possible combinations of p treatments taken k at a time will have 
balance both for pairs and triplets of treatments. Aside from the 
designs with all possible combinations, there appears to be only one 
other design in the literature which has balance for triplets. This one 
isforp = 8,k = 4,r =7,q = 14,\ =3,6= 1. 

One reason for the absence of DBIB designs is that practically all 
balanced incomplete block designs previously listed have had imposed 
the limitation that r < 10. In agriculture experimentation this limita- 
tion is a practical one in that an experimenter rarely can afford to either 
use more material or effort on a single treatment. In organoleptic 
testing, however, the limitation that r < 10 is a needless one. The 
experimenter is usually willing to have a large number of replications, 
because he wants his results to be representative of a large population 
of tasters. Therefore in constructing DBIB designs, any limitation 
put on the number of replications should not be below r = 50 and 
perhaps not there. 

The foods experimenter ordinarily wants a design for which k <S 8. 
The reasons for this were discussed in the Introduction. In addition 
there are seldom very large numbers of treatments to be compared. 
A practical upper limit can probably be put at p = 20. With a large 


: 
4 
4 
} 


INCOMPLETE BLOCK DESIGNS 79 


number of treatments, the analysis becomes time-consuming because 
of the large number of pairs of treatments. This gives another practical 
reason for having p S 20. 

Mathematically, the construction of balanced incomplete block 
designs is a part of the theory of configurations. A configuration is an 
assemblage of elements into sets, each element occurring in the same 
number of sets, and each set containing the same number of elements. 

A configuration satisfying both conditions of balance can be obtained 
by writing down all possible combinations of p elements taken k at a 
time. The total number of combinations is 


a= (?) = (28) 


In these q sets, or blocks, of k elements each, each of the p elements occurs 
in r sets, each pair of elements occurs together in a set \ times, and each 
triplet of elements occurs together in a set 6 times. The values of 
r, \ and 6 are given respectively by 


(p — 1)! 
r= ~ (k— Dip — bY (29) 
- (2-2) - (p — 2)! 
(2-3) - (p — 3)! 
(31) 
The relationships among the design parameters may be rewritten as 
gk = rp, 
r(k — 1) = Xp — 0), (32) 


Mk — 2) = &p — 2). 


The first of these equations expresses the fact that the total number 
of units must be equal both to the product of the number of sets by 
the number per set and to the product of the number of replicates by 
the number of elements; the second, that the number of samples each 
element occurs with must be equal to \ times the remaining number 
of elements; and the third, that the number of samples each pair of 
elements occurs with must be equal to 6 times the remaining number 
of elements. 


Equations (28) to (31) give the values of q, r, \, and 4 for all possible 


: 


80 BIOMETRICS, MARCIL 1954 


combinations of the p clements taken / at a time. It is not always 
practical or possible in experimental work to use these unreduced 
designs. If equations (28) to (31) give values of q, r, X and 6 which 
have a comimon factor, then dividing through by the common factor 
gives new reduced values of q, r, 4, and 6 which satisfy (32) and may 
also yield a design giving balance on both pairs and triplets. Even 
though integral values of the design parameters may satisfy (32), this 
does not guarantee that there is a design with these parameters which 
has double balance. 

Although a number of workers (Bhattacharya, 1944a, 1944b, 1946; 
Bose, 1937, 1942; Carmichael, 1937; Cox, 1940; Fisher, 1940, 1942; 
Yates, 1936) have given methods for constructing balanced incomplete 
block designs, only Carmichael among these has given any methods 
which may be used directly to construct incomplete block designs with 
double balance. 

Carmichael’s solution to the problem is concerned only with the 
case in which k = 4 and 6 = 1. In this case p must be of the form 
6m + 2 or Gm + 4 where m is an integer, although he states that the 
general problem of the existence of such designs appears not to have 
been solved. His method is to start with four elements in a single set. 
By adding another set of four new elements and pairing all pairs from 
each of the two sets, a design is constructed for p = 8 and k = 4. 
Another design for p = 16 can be constructed by taking eight more 
elements and following the same procedure. Thus one can obtain 
designs for p = 2" for n = 3, 4,5 --- in blocks of size k = 4. 

Bose (1953) has pointed out that any DBIB design with design 
parameters p, k, r, g, \, and 6 must be of a form such that all blocks 
not containing a given treatment, a say, form a balanced incomplete 
block design with p’ = p — 1,k’ = =r—dA,q =q- rand 
\’ = \ — 6. Similarly, all the blocks containing a given treatment, 
a say, form an incomplete block design, if treatment a is deleted, in 
which p” = p — 1,k” = k —1,r” =), = rand Xd” = 6. By 
combining two balanced incomplete block designs satisfying these 
restrictions it is sometimes possible to obtain a DBIB design. 

An example of a DBIB design constructed in this manner is Plan 3 
in the Appendix. The design parameters are p = 10, k = 5, q = 36, 
r = 18, = Sand 6 = 3. Plan 11.12 given by Cochran and Cox (1950) 
hasp’ = p—1=9,hk' =k=5,r =q-—r=18 
and = — 6=5. Their Plan 11.11 has p” = p—1=9,k” =k — 
1=4,r’ =v =8,q" =r = 18 and)” = 6 = 3. By adding treat- 
ment 10 to each of the 18 blocks in Plan 11.11 and adding these blocks 


4 
Hoa’ 
4,| 
het 
as 
14 
5 
3) 


INCOMPLETE BLOCK DESIGNS 81 
to the 18 of Plan 11.12, the 36 blocks for Plan 3 are obtained which 
satisfy equations (32). 

Plans 5 and 6 were also constructed by this method from Plans 
11.19 and 11.20, and Plans 11.25 and 11.26, respectively, given by 
Cochran and Cox. Plan 5 is also an ordinary balanced incomplete 
block design with maximum reduction. 

Plan 2, however, could not be constructed by this procedure, al- 
though Plans 10.1 and 11.11 given by Cochran and Cox satisfy the 
necessary conditions. When treatment 10 is added to each block in 
Plan 10.1, adding the two designs gives one for which 6 is not constant 
for all triplets of treatments. 

Plan 2 was constructed by a type of elimination process. For 
example, treatment 10 was entered in the first twelve blocks since 
r = 12. Treatment 9 was entered in the first four blocks since \ = 4. 
Because 6 = 1, the remaining eight treatments are entered, each once, 
into the eight remaining vacancies in the first four blocks, with treat- 
ment 8 pairing with treatment 7. In blocks 5 through 7, treatment 8 
is entered to complete \ = 4 for treatments 8 and 10. Treatments 7 
and 9 have already been with treatments 8 and 10 and therefore do 
not enter these three blocks. The remaining six treatments are entered 
in the last six vacancies in blocks 5 through 7, none pairing with the 
same treatments as in blocks 1 through 4. Proceeding in this manner 
it was possible to construct all 30 blocks. 

The use of block “complements” (Cox, 1940) from Plan 2 were 
used to construct Plan 4. From a design having k # p/2, a second one 
can be obtained for the same number of treatments, in blocks of p — k 
units. This is done by replacing each block by its complement, i.e. 
by a block containing all the treatments missing from the original 
block. For every block in Plan 2 there is a block in Plan 4 which 
contains the six treatments missing from the block in Plan 2. 

Plan 1 is the same as Plan 11.10 given by Cochran and Cox. 

The unreduced designs have much greater utility in organoleptic 
testing than in agricultural experimentation, because it is possible to 
use larger numbers of blocks and replications. Thes¢ unreduced 
designs have not been listed, since they can be constructed by forming 
all possible combinations of k treatments at a time. 

In Table 5 are listed all reduced configurations for p S 16 and 
k < 8 which satisfy equations (32). These configurations have maximum 
reduction possible for each p and k. Although many of them will have 
values of g and r which are too large for most experiments, it would 
be desirable to have all possible designs constructed for which q < 100. 


7 82 BIOMETRICS, MARCH 1954 
TABLE 5 
7 VALUES OF THE DESIGN PARAMETERS FOR ALL REDUCED CONFIGURATIONS WITH 
of pS 16 AND & S 8 WHICH SATISFY EQUATIONS (32) 
s 4 14 7 3 1* 
10 4 30 13 4 1° 
5 36 18 8 3° 
6 30 18 10 5° 
11 4 165 60 18 4 
5 33 15 6 2 
6 33 18 9 4 
7 165 105 63 35 
12 4 165 55 15 3 
5 132 55 20 6 
6 22 11 5 °° 
7 132 77 42 21 
8 165 110 70 42 
13 4 143 44 11 2 
5 429 165 55 15 
6 286 132 55 20 
7 286 154 77 35 
8 429 264 154 84 
14 4 91 26 6 1 
5 182 65 20 5 
6 91 39 15 5 
7 52 26 12 5 
8 91 52 28 14 
15 5 273 91 26 6 
6 455 182 65 20 
7 195 91 39 15 
. 195 104 56 24 
16 4 140 35 7 yore 
5 336 105 28 6 
6 56 “ee * 7 2 
7 80 35 14 5 
30 15 7 


7 *Plan 1 in the Appendix and Plan 11.10 (Cochran and Cox, 1950). 
**Plan given in the Appendix and not previously presented in the literature. 
: **This plan can be constructed by the method given by Carmichael (1937). 


Acknowledgments. 


I I should like to thank Drs. R. L. Anderson and H. L. Lucas for 
their invaluable criticism and advice and for their helpful suggestions 
after reading a draft of this paper. 


INCOMPLETE BLOCK DESIGNS 83 


BIBLIOGRAPHY 


Anderson, R. L. and Bancroft, T. A. 1952. Statistical theory in research. McGraw- 
Hill Book Co., Inc., New York. 

Bengtsson, K. and Helm, E. 1946. Principles of taste testing. Wallerstein Lab. 
Comm. 9: 171-180. 

Bhattacharya, K. N. 1944a. A new balanced incomplete block design. Science and 
Culture. 9: 508. 

Bhattacharya, K. N. 1944b. On a new symmetrical balanced incomplete block 
design. Bull. Calcutta Math. Soc. 36: 91-96. 

Bhattacharya, K. N. 1946. A new solution in balanced incomplete block design. 
Sankhya. 7: 423-424. 

Boggs, M. 1951. Proceedings of Conference on Sensory methods for measuring 
differences in food quality. Bur. Human Nutrition and Home Econ. Agri. Inf. 
Bull. 34: 94. 

Bose, R.C. 1939. On the construction of balanced incomplete block designs. Annals 
of Eugenics. 9: 353-399. 

Bose, R. C. 1942. On some new series of balanced incomplete block designs. Bull. 
Calcutta Math. Soc. 34: 17-31. 

Bose, R. C. 1953. Construction of doubly balanced incomplete block designs. Pri- 
vate communication. 


APPENDIX 
PLANS FOR REDUCED DOUBLY BALANCED INCOMPLETE BLOCK DESIGNS 


Plan 1 
p=8 q=14, \=3, 
=4, r 


k ='4, = 7, 81, 
Block Treatments 
1 1 2 3 4 
2 5 6 7 8 2 
3 1 2 7 s 8 
4 3 4 5 6 
5 1 3 6 8 
6 2 4 5 4 
7 1 4 6 7 
8 2 3 5 8 
9 1 2 5 6 
10 3 4 a 8 
11 1 3 5 7 
12 2 4 6 8 
13 1 4 5 8 
14 2 3 6 fi 


| 


| 
| 


| 
| 


84 BIOMETRICS, MARCI 1954 


Bradley, R. A. and Terry, M. E. 1952. Rank analysis of incomplete block designs. 
I. The method of paired comparisons. Biom. 39: 324-345. 

Calvin, L. D. 1950. The use of balanced incomplete block designs in ice cream experi- 
ments. Unpublished research. 

Carmichael, R. D. 1937. Introduction to the theory of groups of finite order. Ginn and 
Co., Boston. 

Cochran, W. G. and Cox, G. M. 1950. Experimental designs. John Wiley and Sons, 
Inc., New York. 

Cox, G. M. 1940. Enumeration and construction of balanced incomplete block 
configurations. Ann. Math. Stat. 11: 72-85. 

Crocker, E. C. 1945. Flavor. McGraw-Hill Book Co., Inc., New York. 


Plan 2 
p=10, q=30, \=4, 
k = 4, r=12 6= 1, 
Block Treatments 

1 7 | 8 9 | 10 
2 3 | 6 9 10 
3 1 | 5 i) | 10 
4 2 4 | 9 | 10 
5 2 5 8 | 10 
6 3 | 4 8 10 
7 1 6 8 10 
8 1 4 7 10 
9 3 5 7 10 
10 2 6 7 10 
11 4 5 6 10 
12 1 2 3 10 
13 1 4 8 9 
14 2 3 8 9 
15 5 6 8 9 
16 1 3 7 9 
17 2 5 “f 9 
18 4 | 6 7 9 
19 3 4 5 9 
20 1 | 2 6 9 
21 1 2 7 8 
22 3 6 7 8 
23 4 5 f 8 

2 4 6 8 

1 3 5 8 

1 | 5 6 7 

2 | 3 4 7 

2 3 5 6 

I | 3 4 6 

1 | 2 4 5 


1 
AS) 
i, 
} 
ta 
| 
af | | | 
f° 
i 


INCOMPLETE BLOCK DESIGNS 


85 


Dove, W. F. 1943. The relative nature of human preference: with an example in 
the palatability of different varieties of sweet corn. Jour. Comp. Psych. 35: 


219-226. 


Durbin, J. 1951. Incomplete blocks in ranking experiments. 
(Stat. Sect.). 4: 85-90. 


Brit. Jour. Psych. 


Plan 3 

p=10, q= 36, A = 8, 

k=5, r=18, 6 =3, 

Block Treatments 
1 1 2 3 4 10 
2 1 2 | 5 6 10 
3 1 2 7 8 10 
4 1 3 5 7 10 
5 1 | 4 | 6 8 10 
6 1 | 3 | 6 9 10 
7 1 4 | 8 9 10 
8 | 1 5 | 7 9 10 
— 2 3 | 8 9 10 
10 | 2 4 | 5 9 10 
11 2 6 | 7 9 10 
12 2 | 3 | 4 7 10 
13 2 | 5 | 6 8 10 
4 3 5 | 8 9 10 
15 { 6 | 7 9 10 
16 3 4 5 6 10 
17 | 3 6 7 8 10 
18 4 5 | 7 8 10 
19 1 | 3 | 6 7 8 
20 2 3 | 4 6 8 
21 2 4 | 5 7 8 
22 5 6 7 8 9 
23 3 | 4 5 ie. 9 
24 2 4 | 6 8 9 
25 1 | 3 | 4 7 9 
26 1 | 2 3 6 9 
27 1 | 2 3 5 8 
28 1 2 4 5 9 
20 3 4 7 8 9 
30 2 3 | 5 6 7 
31 | 1 | 3 4 5 8 
32 1 | 2 | 4 6 7 
33 l | 4 | 5 (i 7 
34 2 | 3 5 7 mt) 
35 1 2 7 8 9 
36 1 | 5 | 6 8 9 


86 BIOMETRICS, MARCH 1954 


Eisenhart, C. 1947. The assumptions underlying the analysis of variance. Bio- 
metrics. 3: 1-21. 

Fisher, R. A. 1940. An examination of the different possible solutions of a problem 
in incomplete blocks. Annals of Eugenics. 10: 52-75. 

Fisher, R. A. 1942. New cyclic solutions to problems in incomplete blocks. Annals 
of Eugenics. 11: 290-299. 

Fisher, R. A. and Yates, F. 1948. Statistical tables for biological, agricultural and 
medical research. 3rd Ed. Oliver and Boyd, London. 

Galinat, W. C. and Everett, H. L. 1949. A technique for testing flavor in sweet corn. 
Agron. Jour. 41: 443-445. 


Plan 4 
p=10, g=30, A = 10, 
k=6, r=18, 6=5, 

Block Treatments 
1 1 2 3 4 5 6 
2 1 2 4 5 7 8 
3 2 3 4 6 7 8 
4 1 3 5 6 7 8 
5 1 3 4 6 7 9 
6 1 2 5 6 7 9 
7 2 3 4 5 7 9 
8 2 3 5 6 8 9 
9 1 2 4 “ss 8 9 
10 1 3 4 5 8 9 
il 1 2 3 7 8 9 
12 4 5 6 7 8 9 
13 2 3 5 6 7 10 
14 1 4 5 6 7 10 
15 1 2 3 4 7 10 
16 2 4 5 6 8 10 
17 1 3 4 6 8 10 
18 1 2 3 5 8 10 
19 1 2 6 7 8 10 
20 3 4 5 7 8 10 
21 3 4 5 6 9 10 
22 1 2 4 5 9 10 
23 1 2 3 6 9 10 
24 1 3 5 7 9 10 
25 2 4 6 7 9 10 
26 2 3 4 8 9 10 
27 1 5 6 8 9 10 
28 1 4 7 8 9 10 
29 2 5 8 10 
30 3 6 rf 8 9 10 


q 
a 
4 
3 
ae 
| 


INCOMPLETE BLOCK DESIGNS 87 


Greenwood, M. L., Potgieter, M. and Bliss, C. I. 1951. The effect of certain pre- 
freezing treatments on the quality of eight varieties of cultivated highbush blue- 
berries. Food Research. 16: 154-160. 

Ifanson, H. L., Kline, L. and Lineweaver, H. 1951. Application of balanced incom- 
plete block design to scoring of ten dried egg samples. Food Tech. 5: 9-13. 

Harrison, 8. and Elder, L. W. 1950. Some applications of statistics to laboratory 
taste testing. Food Tech. 4: 434-439. 

Hopkins, J. W. 1950. A procedure for quantifying subjective appraisals of odor, 
flavor and texture of foodstuffs. Biometrics. 6: 1-16. 

Hopkins, J. W. 1953. Laboratory flavor scoring: two experiments in incomplete 
blocks. Biometrics. 9: 1-21. 

Kendall, M. G. 1948. Rank correlation methods. Charles Griffin and Co., Ltd., 
London. 

MacLean, J. A. R. and Wickens, R. 1951. Application of an incomplete block design 
to the assessment of quality in cacao. Nature 168: 434. 

Smith, C. F., Jones, I. D. and Calvin, L.D. 1950. Effect of insecticides on the flavor 
of peaches—1949. Jour. Econ. Ent. 43: 179-181. 

Smith, C. F., Jones, I. D. and Rigney, J. A. 1949. The effect of insecticides on the 
flavor of peaches. Jour. Econ. Ent. 42: 618-623. 


Plan 

p= 12, = 22, =5, 

k=6, r=$11, 6=2, 

Block Treatments 
1 1 2 3 5 8 12 
2 2 3 4 6 9 12 
3 3 4 5 a 10 12 
4 4 5 6 8 11 12 
5 1 5 6 7 9 12 
6 2 6 7 8 10 12 
7 3 7 8 9 11 12 
8 1 4 8 9 10 12 
9 2 5 9 10 ll 12 
1 3 6 10 11 12 
1 2 4 7 1l 12 
4 6 Z 9 10 11 
1 5 7 8 10 ll 
1 2 6 8 9 i 
1 2 3 rg 9 10 
2 3 4 8 10 ll 
1 3 4 5 9 ll 
1 2 4 5 6 10 
2 3 5 6 7 11 
1 3 4 6 7 8 
2 4 5 Yj 8 9 
3 5 6 8 9 10 


88 


BIOMETRICS, MARCH 1954 


U.S. Bureau of Human Nutrition and Home Economies. 1951. Proceedings of 
Conference on Sensory methods for measuring differences in food quality. 


Inf. Bull. 34. 


Yates, F. 1936. Incomplete randomized blocks. Annals of Eugenics. 7: 121-140. 


Agri. 


Plan 6 
p= 16, = 30, A 
k=8 r=15, 6=8, 
Block Treatments 
ct | 3 4/656 | 6 | 7 16 
1 | 2 3 8 | 9 | 10 16 
4 1 4 5 14 15 16 
5 | 1 4 5 0 | HU 12 13 16 
so ee. 7 | 8 | 9 12 | 13 16 
7 | 1 | 6 7 10 | 4s | 15 16 
8 2 6 8 0 | | 15 16 
9| 2] 4 6 9 | WM | 12 14 16 
10 | ; | 8s 7 | 8 | 10 | 12 | 14 16 
7 | 9 | | | 15 16 
12 | 3 4 7 | 14 16 
13 | 3 4 | 15 16 
“i 2 5 6 | s | ou 12 15 16 
| 3 | 5 | 10 13 14 16 
6 | 8 | 9 | 0 | WW | 22 13 14 15 
9 | 2 3 | 6 7 | 10 11 12 13 
20 | 2 3 | 6 7 | 8 9 14 15 
21 | 2 si « 5 | 10 11 14 15 
22 5 | 8 9 12 13 
23 | 1 | 12 4 
38 5 | 7 8 10 =| 13 15 
25 | 1 | 3 9 2B 15 
2] 1 | 38 | 4 | 6 8 | 10 | 12 14 
a | 1] 8 6 9 | 10 | 12 15 
si 4 ) 24 4 | 7 10 | 13 14 
| | | 


| 
{ 
| 
| 
i 
| 
| 
| 
as 
af 
| 


FIXED-SAMPLE-SIZE ANALYSIS OF SEQUENTIAL 
OBSERVATIONS 


F. J. ANSCOMBE 
Statistical Laboratory, Cambridge, England 


The methods most commonly employed for the statistical analysis 
of observations are based on the assumption that the number of ob- 
servations was decided on in advance. The number of observations is 
indeed chosen in advance in many types of experiment or observational 
inquiry. In an agricultural field experiment, the number of plots and 
their treatments must be completely specified long before any observa- 
tions can be taken; and (apart from possible failures which will be 
recorded as “missing observations’) the number of observations even- 
tually obtained is precisely the number chosen at the outset. Many 
chemical determinations are made in triplicate or quadruplicate or 
some other fixed number of times, according to a definite rule; and so 
the number of readings obtained in each determination is fixed in 
advance. A sample survey of households in a town may be based on 
inquiries at every hundredth house in a list; here again the number of 
observations is fixed in advance, provided we include the instances of 
non-response. 

There are other sorts of inquiry where the number of observations 
is not fixed in advance, and where the experimenter, if asked just 
before he began, would not be able to say how many observations he 
would take. He may, for example, be following some recognized type 
of sequential sampling rule, such as one of A. Wald’s sequential tests, 
or the inverse sampling of J. B. S. Haldane and M. C. K. Tweedie. 
In that case, when reporting his observations he will presumably state 
what the sampling rule was, and he will use a method of statistical 
analysis specially designed for that sampling rule. 

Most commonly, however, when the number of observations is 
not fixed in advance, this is because at the outset the experimenter 
has not fully made up his mind as to his requirements or resources or 
the nature of the material being studied; and so he does not decide to 
make a certain fixed number of observations nor to follow a definite 
sequential sampling rule, but proposes simply to take observations 
until such time as it shall seem appropriate to stop. Sometimes indeed 


89 


| 
2) 
> 
| 
t 


90 BIOMETRICS, MARCH 1954 


the experimenter does follow a definable sampling rule, but as it is not 
of an orthodox sequential type, he does not think of it as “sequential”. 
It is convenient to call any such observations, taken successively and 
with the total number not preassigned, sequential observations, even 
though the sample size was not determined by a well-defined sequential 
sampling rule consciously adopted by the experimenter. Such observa- 
tions are sometimes reported without any statement as to how their 
number was determined, the experimenter considering this to be an 
irrelevant detail; and it is usual to treat the observations in the statistical 
analysis as if their number had been fixed in advance. The purpose 
of this paper is to consider a number of different situations of this sort, 
to see whether any serious error will result from a fixed-sample-size 
analysis. The paper is a continuation and partial summary of two 
recent discussions in [2] and [4], where bibliographies of relevant litera- 
ture may be found. 


Sample Size Independent of the Observations 


Sometimes it happens that, although the sample size is not fixed 
in advance, and depends on various circumstances, it does not depend 
at all on the observations themselves. For example, the observer may 
decide to take as many observations as he can in a limited period of 
time. Provided the time taken over an observation is not correlated in 
any Way with the reading obtained, the decision to stop does not depend 
on the previous readings. In an investigation of fleas on rodents, the 
observer may decide to catch as many rodents of each species as he 
can, but to discontinue catching any species of which he has already 
caught 100 specimens. With such a sampling procedure, the number of 
animals of any one species caught is an uncertain quantity, but (pre- 
sumably) the number does not depend on the flea populations of the 
animals caught.* The situation would be quite different if the sampling 


*I] am suggesting that probably in such a situation the number of animals caught would not be 
correlated with the flea populations of the animals caught. This would be the case if the activity of the 
animals was not in any way affected by the number of fleas carried. It might still be the case even if the 
number of fleas affected activity; for example, if the more heavily infested animals were less active and 
less likely to be caught. Then the sample of animals obtained by trapping would be biased in favor of 
less heavily infested animals, and this w. uld be so whether the number of animals to be caught was fixed 
in advance or determined by the sampling rule stated. The bias would be a property of the trapping 
method, not of the statistical sampling rule; and there might in fact be no correlation of the type under 
consideration, even though there was a bias. But (just asin the following example of a sample survey 
of households) one can imagine circumstances that could produce such a correlation. For example, 
if heavily infested animals were gregarious and likely to be caught in a bunch, if at all, while the other 
animals were not gregarious, and if heavily and lightly infested animals did not interfere with each 
other, then under the sampling rule stated there would be a positive correlation between the total 
number of animals caught and the infestations of the animals caught. I presume that an effect of this 
sort is in fact unlikely. 


I 
: 
\ 
| 
4 
af 
Vy 
| 


SEQUENTIAL OBSERVATIONS 91 


procedure were modified so that the observer left off catching a species 
as soon as he had caught 100 infested specimens; for now the number 
of animals caught would depend on how many of the caught animals 
were not infested. 

As another example, consider again the sample survey of households 
in a town, where the total number of houses visited was fixed in advance. 
The number of houses from which satisfactory replies are obtained is 
an uncertain quantity, but in ordinary circumstances this number will 
not be correlated in any way with the replies themselves. Thus if we 
ignore the non-responses, we can say that the number of observations 
(i.e. of responses) is independent of the observations. One can, however, 
imagine a situation in which that is not the case, and I think it is instruc- 
tive to consider such a situation, to see that the circumstances required 
are rather unusual. Suppose that one of the items of information 
being sought in the survey is the occupation of the householder, and 
suppose that certain questions asked in the inquiry would be deemed 
mischievous by persons of a certain occupation, let us say by the book- 
makers, if the inquiry were brought to their attention. Then the 
inquiry would proceed normally unless and until the house of a book- 
maker was visited. In that event, the householder, while giving the 
required information, would become incensed, and would forthwith 
start a campaign in the press and elsewhere to urge people not to co- 
operate in the inquiry, on some suitable grounds. It might be found 
that for the remainder of the inquiry the proportion of non-responses 
was much higher than it had been before. Then the number of satis- 
factory responses would depend on the responses themselves, since it 
would depend on whether the bookmaking profession was included 
among the occupations observed. 

In any experiment or sampling inquiry where the number of observa- 
tions is an uncertain quantity but does not depend on the observations 
themselves, it is always legitimate to treat the observations in the 
statistical analysis as if their number had been fixed in advance. We 
are then in fact using perfectly correct conditional probability distribu- 
tions. The possibility of error in using a fixed-sample-size analysis of 
sequential observations therefore only occurs when the number of 
observations depends on the observations themselves. There are various 
ways in which such a dependence can be produced. I have already 
mentioned two examples (both akin to ordinary inverse binomial 
sampling; they can be described roughly as sampling to obtain not 
more than 100 infested animals, and not more than one bookmaker, 
respectively). I now consider some further examples in greater detail. 


: 
2 
i 


92 BIOMETRICS, MARCIT 1954 


Sampling lo Reach a Foregone Conclusion 


Suppose that someone wishes to prove that a coin is biased. He 
may adopt the following procedure. Ie spins the coin repeatedly and 
keeps count of the numbers of heads and tails. He stops sampling 
when first the difference between the cumulated numbers of heads and 
tails is significant at some preassigned significance level, let us say at 
the 5% level, when the difference is tested as for a fixed sample size 
by reference to a binomial distribution with equal probabilities for 
heads and tails. It can be proved (with the aid of Khintchine’s iterated 
logarithm theorem) that sooner or later a “significant” difference will 
be observed, so that the rule can in fact be followed; there is no question 
of having to go on forever. After any number n of spins, let x be the 
cumulated number of heads and y the cumulated number of tails 
(x + y = n). The progress of the sampling can be represented on a 
diagram by the locus of a “sample point” with Cartesian coordinates 
(x, y). Sampling terminates as soon as an appropriate boundary point 
is reached. The first few boundary points are shown as heavy dots in 
Fig. 1; for large n the boundary points lie approximately on the parabola 

= £1.9600V x + y. 

Clearly the use of the ordinary fixed-sample-size significance test 
at the end of this sampling procedure is completely invalid; for the 
probability of obtaining a significant result, if the null hypothesis 
that the coin is unbiased is true, is not just under 5% but is actually 
100%. 

One can similarly frame a sampling procedure designed to support 
the hypothesis that the coin is unbiased. It is necessary now to stop 
sampling before the cumulated numbers of heads and tails differ signifi- 
cantly at the chosen level when tested as if the sample size were fixed. 
One such procedure is represented by the boundary points shown as 
small circles in Fig. 1. For sample sizes less than 25, the boundary is 
chosen so that the difference between the cumulated numbers of heads 
and tails is not significant at the 10% level according to the usual test 
(without randomization device), but so that the difference could be 
significant at the 10% level if one further observation were taken. 
Thus as large a sample size as possibie, but with upper limit at 25, is 
taken, consistent with certainly not obtaining any difference significant 
at the 10% level. 

Here again the use of the ordinary fixed-sample-size test is completely 
invalid, for the probability of obtaining a result significant at the 10% 
level, if the null hypothesis that the coin is unbiased is true, is not just 
under 10% but is actually zero. 


i 
i 
new 
qth: 
fi 
all 
t 


SEQUENTIAL OBSERVATIONS 93 


I have spoken of spinning a coin to see whether it was unbiased, 
but IT could equally well have referred to testing the sex-ratio in an 
animal population, or to testing anything else that can be regarded as 
a binomial probability. The sampling procedure of continuing to take 


is x 
FIGURE 1. SPINNING A COIN TO REACH A FOREGONE CONCLUSION 


Abscissa: cumulated number of heads. 
Ordinate: cumulated number of tails. 


The heavy dots are the first part of a boundary designed to demonstrate that the coin is biased. The 
small circles constitute a boundary designed to demonstrate that the coin is not biased. 


observations until a result significant at a preassigned significance level 
has been obtained can, I think, be followed with any ordinary signifi- 
cance test whatever. Robbins [5] has discussed a procedure for showing 
that the mean of a normal population differs significantly from a stated 


° 
/ 
/ 
° 
° ° 

° 

5 / a” 
o~ 


» 
BIOMETRICS, MARCIL 1954 
i; value; it is exactly parallel to that just considered for a binomial popu- 
lation.* 


If an experimenter deliberately follows one of these procedures for 
reaching a foregone conclusion, and does not admit it when reporting 
3 his work, he can reasonably be accused of dishonesty. But it can 
happen that something of the sort occurs without any conscious inten- 
tion to deceive. There has been a good deal of discussion of such 
1 questions in the literature on extra-sensory perception. In particular, 
i” Feller [3] has given a careful and interesting account of the possible 
_ effects of “optional stopping”, and also of the non-publication of results 
4 that are considered to be uninteresting. 


Double Sampling 


re It sometimes happens that an experimenter begins by taking a 
certain number of observations, and then, if he considers that the results 
are interesting he takes some more observations (perhaps as many 
again), but if the results are not interesting he does not take any more 
observations. In the comparison of a new treatment with an old one, 
if the first sample indicates fairly clearly that the new treatment is not 
- superior to the old one the experimenter will probably wish to dis- 
continue the test, but if it seems possible that the new treatment is 
superior to the old he will wish to investigate the matter further. At 
oe the end, he will no doubt pool all the results and treat them as if the 
sample size had been fixed in advance. 

To see how great an error may be committed in such an analysis, 
let us consider the following specific problem. The mean y of a normal 
population is to be estimated. The variance o° of the population is 
supposed known (or the sample is large enough for it to be estimated 
with negligible error). A first sample of n, observations is taken and 
its mean Z, is calculated. If ¢, > 0, a second sample of n, observations 
is taken, while if , < 0, no further observations are taken. (Here 
n, and n, are fixed.) Let N denote the total number of observations 
(so that N = n, or n, + n,) and Z the average of all the observations 
(we have € = Z, if 7, < 0). An attempt is made to give 95% confidence 
limits for » by calculating 


+ 1.96000/VN. 


What is the true probability that u lies between these limits, for any 
fixed value of yu? 


*Robbins was interested in cure as well as diagnosis. Having pointed out the possibility of sampling 
to reach a foregone conclusion, he showed how the nominal significance level of the fixed-sample-size 
test might be adjusted to allow for optional stopping anywhere between two preassigned sample sizes. 


a 
j 


SEQUENTIAL OBSERVATIONS 95 


It turns out that the answer depends on the values of (uVn;)/o 
and of n./n, . The biggest variations occur when n, is much larger 
than n,. Fig. 2 shows how the true probability varies with » in the 
mathematically-simple limiting case when n,/n, is infinite (continuous 
curve), and also in the case n. = n, (broken curve). The ordinate is 


° 
1004 


-4 -3 -2 re) 2 3 4 
FIGURE 2. DOUBLE SAMPLING TO ESTIMATE THE MEAN OF A NORMAL 


POPULATION 
Abscissa: value of (u +/n)/o. 
Ordinate: probability that yu lies between the calculated limits. 


The continuous curve is for n2/n; infinite. The broken curve is for nz = ny 


the probability that u lies between the calculated limits, the abscissa 
is the value of (uVn,)/c. For n,/n, infinite, the probability varies 
between 92.625% (realized when yu is such that the probability that 
only one sample will be taken is just 2.5%) and 97.375% (realized 
when uz is such that the probability that only one sample will be taken 
is just 97.5%). If» is so large that it is almost certain that two samples 
will be taken, or so small that it is almost certain that only one sample 
will be taken, the probability that the calculated limits will bracket y 
is very close to the intended value of 95%. 

For smaller ratios of n, to n, , the deviations of the true probability 
from 95% are reduced in magnitude. Limits between which the true 
probability varies are approximately as follows:— 


For n, = 3n, : 93.65% and 96.35%. 
For n, = n, : 93.34% and 96.66%. 
For n, = 2n, : 93.08% and 96.92%. 


Ub) 
9 5° 
aie 
Ag 
ig 


Alsat) 


6 BIOMETRICS, MARCH 1954 


The values of (uVn,)/o at which these extreme values are attained 
are the same as before, namely + 1.9600. 

I have supposed that a second sample would be taken whenever 
the mean of the first sample exceeded a certain value (taken to be 0). 
Sometimes a second sample is taken only if the mean of the first sample 
is neither very low nor very high, say only if #, lies between —a and 
a (a being a fixed positive quantity). The disturbances produced in 
the true probability that » lies between the calculated confidence 
limits, due to the abrupt changes in N for &, near to —a and near to a, 
are independent and additive; and the greatest divergence between 
the true and nominal probabilities occurs for (a Vv; n,)/o = 1.9600, 
n,/n, large, and » = 0, when the true probability is 90.25%. 

We can rephrase this last result as follows. A first sample of fixed 
size n, is taken. If the mean @, does not differ significantly at the 
5° level from some critical value, say 0, a second sample of fixed size 
n, is taken, n, being much larger than n, . Otherwise, no further 
observations are taken. Then if the population mean is in fact 0, the 
probability that the mean of all the observations will differ significantly 
from 0 at the 5% level, according to the ordinary fixed-sample-size 
test, is not 5% but 9.75%. 


A Confidence Interval of Preassigned Width 


A refinement of the preceding double-sampling procedures is to take 
observations in several small samples, or even one by one, until the 
required amount of information has been obtained. To see the effect 
of this greater flexibility of sampling, let us consider the following 
specific example. It is required to estimate the mean y of a normal 
population by a confidence interval of width 1 and coefficient 1 — a, 
the population variance o” being unknown. When any number n of 
observations have been taken, confidence limits for » with coefficient 
1 — a@ are given, according to the usual fixed-sample-size theory, by 


(n-1) 
+ / Vn, 


where #, is the mean of the n observations, s is the usual unbiased 


quadratic estimate of o’, and ¢,""'’ is such that a random variable 
following Student’s distribution with » — 1 degrees of freedom has 
probability 1 — a of lying between +1,""'’. Consider the sequential 
sampling rule: observations are taken one by one until first 


Denoting this value of n by N, we ask what is the true probability 
that u lies between + 31. 


4 
| 
a 
|! 
| 
& 
| 
af 


SEQUENTIAL OBSERVATIONS 97 


Provided U is small, so that N is large, it can be shown that this 
confidence interval has very nearly the required confidence coefficient 
of 1 — a Tet 


_ 
2 


l 


where ¢, is such that a standard normal variable has probability 1 — a of 
lying between +/,. When itisan integer, vis the number of observations 
that would be taken if o were known beforehand. Let ¢(¢,) denote 
the ordinate of the frequency function of a standard normal variable 
when the abscissa is ¢, . Then it can be shown (by an easy deduction 
from formulae given in [2]) that the true probability that » lies between 
+ 31 is approximately 


Vv 


l-—a- (2) 
provided »v is large (and provided that N is in no case allowed to be 
less than 4). If for example a = 5%, the true confidence coefficient 
is approximately 
13.5 
(3) 


If instead of using the percentage point ({""') of Student’s distribution 


in the sampling rule (1) we had used the percentage point ¢, of the 
standard normal distribution, (3) would have been changed to 


9 
95 — (4) 


Thus the error resulting from treating the sequential observations as 
if the sample sized were fixed is of the same order of size as, but less in 
magnitude than, the error in replacing Student’s distribution by a 
standard normal distribution. Clearly, for practical purposes, the 
error is negligible unless the average sample size is quite small. 

A similar procedure can be followed when estimating the difference 
between the means of two normal populations of which the variances 
are unknown but supposed to be equal. The observations are taken 
in pairs, one from each population, and after n pairs have been taken 
an estimate of the population variance is available having 2(n — 1) 
degrees of freedom. Sampling continues until the confidence interval 
with coefficient 1 — @ for the difference in means, calculated according 
to the usual fixed-sample-size formula, is first less than or equal to the 
given length / in width. If gy denotes the difference in sample means 


{ 


98 BIOMETRICS, MARCIL 1954 


when sampling terminates, the confidence interval is taken to be 
(gv — 3l, Jw + 31). Let 


8o°te 


v= 


where o° denotes the common population variance. Then the true 
confidence coefficient for the calculated interval is approximately 


— le) 6) 
v 

provided » is large (and provided that N is in no case allowed to be less 

than 3). For a = 5%, this is 

95 — 22 0, (6) 


v 


The divergence from the intended figure of 95% is considerably smaller 
than that indicated before at (3). If the percentage point of Student’s 
distribution is replaced by the standard normal percentage point in 
the stopping rule, the result corresponding to (4) above is 


_ 168 


95 %. (7) 


It is not the purpose of this paper to discuss the specification of 
sampling rules which will lead exactly or almost exactly to estimates 
having preassigned properties; I am concerned merely to show what 
error results from the strictly incorrect use of a fixed-sample-size formula. 
But it is perhaps worth while to point out that the sampling rule con- 
sidered above for estimating the mean of a single normal population 
can be improved very easily as follows. Observations are taken one 
by one until first the inequality (1) is satisfied, and then one further 
observation is taken. Denoting the final number of observations by 
N, we calculate Zy + 31. The probability that yu lies between these 
limits is now given approximately by the expression (2), except that the 
factor 1.176 is replaced by 0.176. The error is thus about one seventh 
of its previous value, and in (3) the factor 13.5 becomes 2.0. No device 
of this sort leads to any improvement in the sampling rule given above 
for estimating the difference in means of two normal populations, but 
the error in that case is already very small. (I am indebted to Professor 
J. W. Tukey for the suggestion that a simple improving device should 
be sought.) 


T| 


4 
| 
i 
be 
I 
‘ 
3 
1} 
ket 
ral tif 
| 
hy 
| 


SEQUENTIAL OBSERVATIONS 99 


Discussion 


We have seen that, when the number of observations depends on 
the observations themselves, it can happen that a fixed-sample-size 
analysis of the observations is grossly wrong (as with our examples of 
sampling to reach a foregone conclusion), and it can also happen that a 
fixed-sample-size analysis is only very slightly in error (as with our 
examples of sampling to obtain a confidence interval of preassigned 
width). The examples of double sampling considered were intermediate, 
in that for some values of the unknown parameter there might be an 
appreciable (but not gross) error in the fixed-sample-size analysis, 
while for other values the error was negligible. 

The magnitude of the possible error in using a fixed-sample-size 
method of analysis is related to the dispersion of the sample size. By 
“dispersion of the sample size” I mean the variability in sample size 
that would be observed if the experiment were repeated several tines 
under similar conditions. It has been shown that if the average (or 
median) sample size is large and if the dispersion of the sample size 
is relatively small (say, if the coefficient of variation of the sample size 
is small), then there will be little error in treating the observations as 
if the sample size were fixed (see [{1]). These conditions are fulfilled 
in the examples of sampling to obtain a confidence interval of preas- 
signed width J, provided 1 is small. If the sampling were carried out 
several times, the sample sizes would probably show only a very small 
percentage variation. The conditions are not fulfilled in the examples 
of double sampling, unless the probability is either close to 0 or close 
to 1 that only one sample will be required. When n, = n, , the coeffici- 
ent of variation of the sample size can be as high as 35%, this being 
the value when the probability that only one sample will be taken is 
about 2/3; if np = 2n, , the coefficient of variation can be as high as 
57%; and so on. These are not very small coefficients of variation. 
In the examples of sampling to reach a foregone conclusion the sample 
size has a very great dispersion, both very low and very high values 
being quite probable. If the procedure represented by heavy dots in 
Fig. 1 is followed, using an unbiased coin, the distribution of sample 
size has an infinite mean, and yet there is a substantial probability 
(about 0.11) that the sample size will not exceed 20. 

Thus we may suspect appreciable error in a fixed-sample-size analysis 
if the following conditions are both satisfied:— 


(i) the number of observations depends on the observations themselves, 
(ii) the relative dispersion of the number of observations in repeated 
sampling is not very small. 


§ 


100 BIOMETRICS, MARCH 1954 


But even if these conditions are satisfied, there is not necessarily any 

danger of error. It is only with certain sorts of statistical analysis 

that we can be misled in treating the sample size as fixed, namely when 
(iii) reference is made in the statistical analysis to some property of 
the distribution of a statistic. 


Methods of analysis of this sort include: (1) calculating an unbiased 
estimate, with or without its standard error, (2) calculating a confidence 
interval, in the sense of J. Neyman’s theory, (3) making a significance 
test, in the sense of the Neyman-Pearson theory of tests. The terms 
“unbiased’”’, “standard error’, “confidence coefficient’, “significance 
level’, “critical region’’, etc., can be explained only by reference to the 
sampling distribution of a function of the observations; and it is because 
this sampling distribution (given the sample size) is liable to be affected 
by the sequential sampling rule followed that we run a risk of error in 
supposing that the’ sample size is fixed when it is not. 

All risk of error is avoided if the method of analysis uses the observa- 
tions only in the form of their likelihood function, since the likelihood 
function (given the observations) is independent of the sampling rule. 
One such method of analysis is provided by the classical theory of rational 
belief, in which a distribution of posterior probability is deduced, by 
Bayes’ theorem, from the likelihood function of the observations and a 
distribution of prior probability. Closely related to this is R. A. Fisher’s 
method of fiducial inference. Pragmatic methods of analysis in which 
the expected risks attached to alternative decisions are considered are 
also based on likelihoods, namely Wald’s method of minimax decision 
functions and various developments and modifications of that, in 
particular D. V. Lindley’s method of minimum unlikelihood. 

Unfortunately many of us find the sort of analysis that refers to 
the distribution of a statistic more serviceable in practice than methods 
based directly on likelihood, despite the theoretical advantages of the 
latter. The topics of this paper therefore seem to me to be worth 
occasional consideration. If we are able to conclude that the dangers 
feared are unimportant and negligible in the situations with which we 
have to deal, then so much the better. 


REFERENCES 

1] Anscombe, F. J. Large-sample theory of sequential estimation. Proc. Camb. 
phil. Soc., 48 (1952), 600. 

{2} Anscombe, F. J. Sequential estimation. J. R. statist. Sor. B, 15 (1953), 1. 

{3] Feller, W. K. Statistical aspects of ESP. J. Parapsychol., 4 (1940), 271. 

[4] Lindley, D. V. Statistical inference. J. It. statist. Soc. B, 15 (1953), 30. 

{5} Robbins, H. Some aspects of the sequential design of experiments. Bull. Amer. 
math. Soc., 58 (1952), 527. 


| 
j 
ad 
# 
HH 


THE COMBINATION OF ESTIMATES FROM DIFFERENT 
EXPERIMENTS* 


G. CocHraNn 
The Johns Hopkins University, 
Baltimore, Maryland 


1, INTRODUCTION 


When we are trying to make the best estimate of some quantity 
u that is available from the research conducted to date, the problem 
of combining results from different experiments is encountered. The 
problem is often troublesome, particularly if the individual estimates 
were made by different workers using different procedures. This paper 
discusses one of the simpler aspects of the problem, in which there is 
sufficient uniformity of experimental methods so that the 7th experi- 
ment provides an estimate x, of u, and an estimate s; of the standard 
error of x; . The experiments may be, for example, determinations of 
a physical or astronomical constant by different scientists, or bioassays 
carried out in different laboratories, or agricultural field experiments 
laid out in different parts of a region. The quantity x; may be a simple 
mean of the observations, as in a physical determination, or the difference 
between the means of two treatments, as in a comparative experiment, 
or a median lethal dose, or a regression coefficient. 

The problem of making a combined estimate has been discussed 
previously by Cochran (1937) and Yates and Cochran (1938) for 
agricultural experiments, and by Bliss (1952) for bioassays in different 
laboratories. The last two papers give recommendations for the practical 
worker. My purposes in treating the subject again are to discuss it in 
more general terms, to take account of some recent theoretical research, 
and, I hope, to bring the practical recommendations to the attention 
of some biologists who are not acquainted with the previous papers. 

The basic issue with which this paper deals is as follows. The 
simplest method of combining estimates made in a number of different 
experiments is to take the arithmetic mean of the estimates. If, however, 
the experiments vary in size, or appear to be of different precision, the 
investigator may wonder whether some kind of weighted mean would 
be more precise. This paper gives recommendations about the kinds 
of weighted mean that are appropriate, the situations in which they 


*Department of Biostatisties, Paper No. 292. This work was assisted by a contract with the Office 
of Naval Research. 


161 


j 
4 
is 
n 
1S 
1e 
4 
al 
a 
5 
‘h 
ae 
LO 
h 
rs 
ve 
= 
a 


102 


BIOMETRICS, MARCH 1954 


are appropriate, and the circumstances in which the unweighted mean 
is to be preferred. Methods for obtaining a standard error to be attached 
to the final estimate are also presented. 

The mathematical theory which bears on the problem is complex, 
and some of the recommendations are based on approximations in 
theory. Wherever possible, the recommendations are documented by 
references to published papers. Some theoretical issues are discussed 
briefly in sections 6 to 9 in cases where the documentation available in 
the literature does not seem adequate. 


2. MATHEMATICAL MODELS 


As will appear later, the best combined estimate depends on the 
nature of the data. It is advisable to consider the preliminary question: 

Do the values of the x; agree among themselves within the limits of 
their experimental errors? 


If they do, we may postulate an underlying mathematical model 
of the form 


%=ute, (1) 
where e, is the experimental error of z, . 


If the values of the z, differ by more than can be accounted for by 
their experimental errors, we require a model of the form 


(2) 


where yu; , Which might be called the “true value” in the 7th experiment, 
varies from one experiment to another. There are numerous reasons 
why such variations may exist. They may be the result of differences 
in the experimental techniques used in the different experiments, of 
biases that vary in size from one experiment to another, or of real 
changes in yu; due to the environment in which the experiment is con- 
ducted. Frequently the investigator is able to predict, from general 
knowledge or from past experience with the same type of data, whether 
the yu, are likely to vary. In agricultural experiments that are located on 
farmers’ fields throughout an area, for instance, it is quite commonly 
found that the response to a fertilizer exhibits a real variation from 
field to field. This variation is often described as an interaction of the 
effect with experiments. Tests of significance for this interaction will 
be presented later. 

If an interaction exists, the type of combined estimate that is wanted 
requires careful consideration. It is necessary to take into account the 
purpose for which the combined estimate is to be made and the reasons 
for the presence of interaction, in so far as these can be discovered. 


| 
| 
| 
all 
4 
oh 
4 
4 
a 
| 
LA 


COMBINATION OF ESTIMATES 103 


The following are illustrations of some of the situations that may arise. 

(1) In the determination of a physical constant, we might conclude 
that interaction exists because some of the experiments (e.g. the earlier 
ones) were done by a technique that is subject to a bias of unknown 
magnitude, whereas the remainder of the experiments appear to be 
unbiased. In this event we would presumably discard the results 
from the biased experiments and consider only a combination of results 
from the unbiased experiments. 

(2) In agricultural experiments the variation in »; may be due 
mainly to the soil type on which the experiments are conducted. The 
experiments can then be classified into groups, each of which represents 
a specific soil type. It may also happen that the number of experiments 
on a given soil type is not at all proportional to the area of the crop 
under that soil type in practical farming, perhaps because the experi- 
ments were deliberately set up to include some of the rarer types. In 
this case, if our object is to estimate a mean over some defined farming 
area, we might adopt the kind of weighted mean that is appropriate to 
stratified sampling. Thus if @,; is the estimated mean for the jth soil 
type, and A; is the estimated area of the crop under this type in the 
population, the overall mean is taken as >> A,#;/>> A; . 

(3) In the preceding situation we might decide, alternatively, not to 
estimate the overall mean at all, but to present the individual estimates 
for the different groups or strata. This practice is advisable where the 
u; vary so much that different practical recommendations must be 
given in different strata. Of course, such recommendations are feasible 
only when the user of the results knows to which stratum he belongs. 
An example might be experiments on the feeding of chickens, where 
the results vary with the breed of the chickens. 

(4) Occasionally, in laboratory experiments which were thought to 
be well-controlled, large interactions may appear for which no adequate 
explanation can be given. In this event it might be best to hand the 
problem back to the experimenters, on the grounds that there is not 
much point in attempting a “best”? combined estimate until the ex- 
perimenters can reach better agreement in their results, or at least 
find out why they disagree. 

These illustrations, which do not exhaust the possibilities, bring 
out the point that the combination of the individual estimates is not a 
routine matter, but requires clear thinking about both the nature of 
the data and the function of a combined estimate. However, unless it 
is decided that no type of combined estimate will serve a useful purpose, 
we do face the problem of combining at least over certain subgroups of 
the experiments. 


y 
il 
y 


104 BIOMETRICS, MARCH 1954 


In the remainder of this paper it will be assumed that the experiments 
which we have decided to combine are a random sample from the popu- 
lation of experiments about which we wish information. This assump- 
tion is far from being universally true in practice and should be examined 
before adopting the methods in this paper, since series of experiments 
often come into existence in a rather haphazard way. 

The discussion will deal only with the combination of a single 
estimate x, from each experiment. When each experiment contains 
more than 2 treatments, we may wish to make a combined analysis of 
all the experimental results. Some methods for handling this problem 
are given by Yates and Cochran (1938), Cochran and Cox (1950) and 
Kempthorne (1952). 


3. EXPERIMENTS OF THE SAME SIZE AND THE SAME PRECISION 


The simplest case is that in which all k experiments are of exactly 
the same type, with no missing data, and the estimates x, all have the 
same error variance o. In this event the estimated variances s, will 
each have n degrees of freedom and will each be unbiased estimates of o”. 
To avoid confusion, note that the symbols s; and o” refer to the variance 
of x, , not to the variance per single observation in the experiment. 

This case will occur when every experiment has the same precision 
per observation, and x; is the same linear function of the observations 
in the experiment. Thus the variance o° will be of the form a./f, 
where o, is the common variance per observation, and f is a divisor 
which is the same in all experiments. For example, if x, is an unweighted 
mean over r replications, f = r, and if 2, is the difference between two 
such means, f = r/2. This case would not apply, however, if x, were 
the regression of yield on plant number, because the variance of x, 
would depend on the distribution of plant numbers in the 7th experiment. 

This case can be handled by familiar and elementary methods, but 
is included for completeness. 

To test whether the x, are of the same precision we may apply 
Bartlett’s test, in which we compute x’, with ( — 1) degrees of freedom, 
as 


9 k 
= E log — >on log «| (3) 
t=) 
where 5° is the arithmetic mean of the s? and 
(k + 1) 
3nl- 


Although the investigator can never be sure that the 2x; all have the 
same variance, it is suggested, as a working rule, that the methods in 


es 
| 
| 
1 
mois 
» 
4 
v3 
| 
! 
1 
= 


COMBINATION OF ESTIMATES 105 
this section are adequate whenever Bartlett’s x’ is not significant at 
the 5 per cent level. This opinion is based on the results of a number 
of sets of data which were worked with and without the assumption of 
homogeneity. Methods which do not require the assumption are given 
in section 5. 

On the assumption that the s; are homogeneous, the interaction of 
the x, with experiments can be tested by means of a standard F-test 
in the analysis of variance (table 1). 


TABLE 1 
TEST OF THE VARIATION IN xj FROM EXPERIMENT TO EXPERIMENT 


Source of variation | df. Mean Squares 
| | 
| 
Interaction with experiments (k — 1) | = Z(xi — £)?/(k — 1) 
Pooled internal error nk 3 


Interactions Absent. ' 


If there is no interaction (u, all equal to u), then from equation (1) 
each x; is an estimate of « with common variance o°. Hence, if the 2; 
are approximately normally distributed, the recommended estimate of 
uw is their unweighted mean Z, with variance o°/k. 

To find a sample estimate of the standard error of #, we may note 
that the quantities % and s; are both estimates of o°, with nk and 
(ck — 1) degrees of freedom, respectively. The best estimate of the 
standard error of € is the pooled value 

Inks” + (k — Is; 

s.e{#) = (4) 
with (nk + k — 1) degrees of freedom. Since nk is usually much 
greater than (k — 1), the use of §/+/k as the estimated standard 
error is not uncommon. 


Interactions Present. 


In this event, the quantity to be estimated is the population mean 
uw of the wn; . Let o, be the standard deviation of the distribution of 
the »; . Then from equation (2), 


te +e 


It follows that the estimates 2; vary about u with variance (o; + o°). 
Since the 2; are still of equal precision as estimates of y, the un- 


P 

| 

| 

| 

7 5 


106 BIOMETRICS, MARCH 1954 


weighted mean Z is still the best estimate of u. However, the variance 
of = is now (0% + o*)/k, and expression (4) cannot be used for the 
estimated standard error of Z. It is easy to show algebraically that 
s; in table 1 is an unbiased estimate of («2 + 0”), so that for the standard 


error of < we use 
a k(k — 1) ©) 


To summarize, the only decision that needs to be made is whether 
we will regard interactions as present or absent. The more conservative 
procedure is always to regard interactions as present, since estimate 
(5) for the standard error is valid whether interactions are present or 
not. If the number of experiments, k, is small, however, this estimate 
has low precision, and one is tempted to use the pooled estimate (4). 

A procedure followed by some workers is to pool whenever the 
F from table 1 is not significant at the 5 per cent level. This has been 
criticized by others on the grounds that it may underestimate the 
standard error of <. The consequences of a rule of this kind have been 
examined by Bancroft (1944) and Paull (1950) for a series of values 
of the ratio I = o;/o*. Their results show that the rule is somewhat 
hazardous, in that for moderate values of J (say between 1/4 and 4) 
it underestimates the standard error and gives too many significant 
results in a subsequent t-test of @. The alternative rule of pooling only 
when F < 2, suggested by Paull, is safer: its chief defect is that if 
I is small, it may slightly overestimate the standard error. 


4, EXPERIMENTS OF DIFFERENT SIZES BUT OF THE SAME PRECISION 
PER OBSERVATION 

Sometimes the experiments that are being combined differ in size 
and structure, but there is reason to believe that experimental error 
variances per observation are the same in all experiments. If oj denotes 
this common variance, the variance o7 of the estimate z; will be of the 
form o; = o3/f; , where f, is a factor depending on the type of experi- 
ment. For instance, if x; is a mean over 7; replications, f; = r,; , and 
if x, is the difference between two such means, f; = 17;/2. Similarly, 
if sj; denotes the estimated error variance per observation in the ith 
experiment, s; = 80:/fi . 


Example 1. 


This example was obtained by selecting two treatments from a 
series of experiments on the effectivenes; of carbon tetrachloride in 


@ 
| 
| 
| 
| 
| 
13 
| 
at 
| 


| 


COMBINATION OF ESTIMATES 107 


killing worms (Nippostrongylus muris) which are parasitic on rats 
(Whitlock and Bliss, 1943). Each rat was injected with 500 larvae. 
Eight days later, the rats were treated with varying doses of CCl, 
and two days later all rats were killed and the numbers of adult worms 
were counted for each rat. The treatments to be discussed are the 
control (no CC1,) and a dose of 0.063 cc per rat. Three experiments 
included both treatments, with numbers of replications as follows. 


Expt. Control 0.063 ce CCl, 
1 5 3 
2 5 5 
3 6 7 


The relevant data are shown in table 2. 


TABLE 2 
ESTIMATES (z;) AND VARIANCES PER RAT (s°o;) 

No. of adult worms Difference 
Expt. d.f. 
Control CCl Xi 8; fi ni 
1 290.4 204.0 86.4 3,223 1.875 10 
2 323.2 165.2 158.0 8,370 2.500 14 
3 274.0 262.7 11.3 2,606 3.231 16 


The estimates to be combined are the differences x; between the 
mean recoveries for the control and the treated rats. The values of 
f; may be verified from the aumbers of replications already reported. 
In computing the variances per rat, sj; , the data from treatments with 
smaller doses of CC1, were also used, so that the degrees of freedom are 
larger than would be provided by the two treatments discussed here. 

The first step is to apply Bartlett’s x’ test to the estimated variances 
per observation. 


9 
x = [n. log — ny log (6) 


ad 
| 
; 


108 BIOMETRICS, MARCIT 1954 


where 
n= = 40 


% = DS nasisn. = (191,106) 40 = 4,777.6 


1 1 
C=1+3q— - = 1.035. 

The value of x’ is 5.59, with k — 1 = 2 degrees of freedom. The 
significance probability is about 0.06, and it is doubtful whether the 
variances per rat can be considered homogeneous. For the present, 
this assumption will be made: in section 5 the example will be re-worked 
without this assumption. 

In order to test whether the estimates x; agree with each other 
within the limits of their experimental error variances, we carry out a 
conventional analysis of variance on a single-observation basis (table 3). 


TABLE 3 
ANALYSIS OF VARIANCE ON A SINGLE-OBSERVATION BASIS 


Source of variation d.f. | Sum of squares Mean squares 


Interaction with 
expts. (k—1) | — 20)? | = files — Fu)*/(k — 1) 


Note that the factors f; are used as weights in computing the sum of 
squares for the interaction with experiments. 
The quantity Z,, in table 3 is the weighted mean 


ry = (7) 

The F-ratio, si,/8, , gives a test of significance of the presence of inter- 
actions. 

At this point there are three situations to be considered. 
Interactions absent. 

This case is a familiar one in elementary text-books. We revert 
to the mathematical model 

zi + e; 

so that x; is an estimate of » with variance o,/f; . By least squares 
theory, the best combined estimate of w is the weighted mean Z,, , 
and its variance is 


Vig.) = 


‘ 
He | 
| 
| 
| 
| 
| 
> 


COMBINATION OF ESTIMATES 109 
The most precise combined estimate of a, is obtained by pooling the 
sums of squares in table 3 to give 


Sp 


k = + Ne 


The standard error of is taken as V> , with (k — 1 + n.) 
degrees of freedom. 


Interactions large. 


With the more general model 


the variance of x, , as an estimate of y, is 


If the values of o, and o; were known, the least squares estimate of u 
would be the semi-weighted mean 


= >), where W; = ——3 (8) 


The semi-weighted mean (8) includes the weighted mean (7) as a 
particular case, since it reduces to the weighted mean when o, = 0. 
At the other extreme, when interactions are large, ¢, is large relative 
to o» and the semi-weights are all approximately equal. The semi- 
weighted mean then differs little from the unweighted mean. 

Since it is not profitable to go to the extra trouble of computing the 
semi-weighted mean unless we are confident that there will be a worth- 
while gain in precision over the unweighted mean, the precision of the 
unweighted mean is compared with that of the semi-weighted mean in 
section 6. The relative precision is found to depend on two factors: 


(i) The ratio / of the interaction variance o; to the average of the 
experimental error variances o,/f; . The higher the value of J, the 
smaller is the loss of precision resulting from the use of the unweighted 
mean. 

(ii) The amount of variation in the factors f; . As the variation in 
the f; increases, the loss of precision resulting from the unweighted mean 
increases. 


(7;) =o, + f 
Ji 
04 Se 
+ 
J 
{ 
; 


110 


BIOMETRICS, MARCH 1954 


The theoretical examination in section 6 leads to the rules given in 
table 4. 


TABLE 4 
WORKING RULES FOR THE USE OF THE UNWEIGHTED MEAN 


If ratio of largest Use the unweighted mean 


to smallest f; whenever 
<2 F>3 
between 2 and 6 F>4 
>6 F>5 


The rules will be illustrated from example 1. The analysis of variance 
appears in table 5. 


TABLE 5 
, ANALYSIS OF VARIANCE FOR DATA IN TABLE 2 


Source of variation | d.f. | Sums of squares | Mean squares F 
Interaction with expts. | 2 30,506 82, = 15,253 3.19 
Pooled error | 40 | 191,106 = 4,778 
ie The F-ratio, 3.19, is almost at the 5 per cent level, indicating a 


variation in the effectiveness of the CC1, from experiment to experiment. 
From table 2 we see that the ratio of the largest to the smallest f; is 
less than 2. By the rule in table 4, the unweighted mean of the z; is 
recommended since F is over 3. The estimate is 


a 86.4 + 158.0 + 11.3 


3 85.2 

The standard error of is given by 
M (v; — 10,763 
S.e.; 4) = 4223 
; = k(k — 1) (3)(2) 
a The usual procedure is to attribute (k — 1) or 2 degrees of freedom 
4 to this standard error. This is not quite correct, because the x; are 
15 presumably not all of the same precision as estimates of », even though 
a we have decided to use their unweighted mean as an overall estimate. 


An approximate adjustment which gives a smaller number of degrees 
of freedom is developed in section 7, although my experience is that it is 
rarely needed for this application. The adjustment requires some 


| 
| 
| 
4 
| 


COMBINATION OF ESTIMATES 111 


supplementary calculations which are given for another purpose in the 
subsequent table 6. 
In column 2 of table 6, let 


=2 
s+ 
and let 
6,211 : = = 38,773,457. 
k k 
The adjusted number of degrees of freedom is 
n, = 1.99 (9) 


The adjustment has a negligible effect. 


Interactions moderate. 


With most sets of data, either the weighted or the unweighted mean 
will prove to be satisfactory. There remain some cases in which, 
although F is not large, we believe from the nature of the data that 
interactions are likely to be present, and are reluctant to rely on either 
the unweighted or the weighted mean. A sample semi-weighted mean, 
which is an analogue of the semi-weighted mean in equation (8), may be 
tried. 

In equation (8) the true semi-weights W; were given as 

Wi=— 
+ 
The first step is to obtain sample estimates of o2 and a . 

It is easily shown by algebra that the expectations of the two mean 

squares in the analysis of variance in table 3 are as follows. 


= + f'or 


= 


The quantity fv is ite smaller than the arithmetic mean of the f; . 
From these results, an unbiased estimate of a’, is 


= (si, — (10) 


where 


i 
: | 
n 
h 
Ss 
IS 


a 


112 BIOMETRICS, MARCH 1954 


Finally, for the sample semi-weighted mean we take 
> > 


where 


To illustrate the method from example 1, the analysis of variance 
(table 5) gives 


&, = 15,253 : % = 4,778. 


Te value of f’ is found from table 2 to be 


905 
606 — 20.200 


7 .GO6 


l 


TABLE 6 
CALCULATION OF SEMI-WEIGHTED MEAN 


4,778 | Reciprocal 
| _ 939 4,448 eciproca 
Expt f; Si | W; 
1 | 6,7 | .000147 86.4 
2 6,143 | 000163 158.0 
3 | 5,711 000175 11.3 
Total | | 000485 
| 
Hence, from equation (10), 
> 15,253 — 4 


The semi-weights are computed in table 6. 


7 (86.4)(147) + (158.0)(163) + (11.3)(175) 


485 = $3.4 


This does not differ materially from the unweighted mean, 85.2 
The standard error of Z,,, is, approximately, 


s.e{Z,.) = —= 


~ y/000485 


G4 pee 


4 
| 
{ 
| 
7 
2 So 
s+ = 
i 
| 
| 
— 
{ 
| 
atid 


COMBINATION OF ESTIMATES 113 


This formula may give values that are slightly too low, since it 
ignores the faet that the weights IW, are subject to sampling errors. 
For é-tests, it is suggested that (& — 1) degrees of freedom be assigned 
to the standard error, although the distributions involved have not 
yet been adequately investigated. 


5. EXPERIMENTS OF UNEQUAL PRECISION PER OBSERVATION 


The methods in this section are to be used when Bartlett’s x’ is 
significant or when, for any other reason, the investigator does not wish 
to assume that the variances per observation are equal. The methods 
are the same whether the experiments are identical in size and type or 
not. 

As before, the estimate in the 7th experiment is denoted by x, , and 
s; is an unbiased estimate of the error variance of x, , based on n, degrees 
of freedom. 


The test for interactions. 


The first step is to test for the presence of interactions. One approach 
is to calculate the ordinary mean square deviation of the x; , 1.e. 


G: - 


k-1 


If there are no interactions, this quantity is an unbiased estimate of 


Consequently, an approximate F-test of the interactions is made 
from the ratio 


where = > s7/k. 

From an elementary point of view, the degrees of freedom might 
be taken as (k — 1) andn, = D> n,. Actually, the calculated F does 
not follow the tabular /’-distribution, because the x; vary in precision. 
The tabular F-distribution might still be used, as an approximation, 


by reducing the numbers of degrees of freedom ascribed to /. b'ollewing 

equation (9), the degrees of freedom in the numerator may be taken as 
k iy Vv? 

(1) 


(k — 2)V. + V? 


oh 
che 
2 
a 
( z)? 
(k — D8 


BIOMETRICS, MARCI 1954 


where 


For the denominator, a familiar rule is 
(d 8)’ 


(12) 


There is, however, the further objection that F may be insensitive in 
detecting the presence of interactions, because experiments with low 
precision receive the same weight as those with high precision. 

An alternative course, less open to these objections, is to base the 
test of interactions on the weighted sum of squares of deviations 


Q= — 2), 


i=1 


where 


If the degrees of freedom n; are large, Q follows the x’ distribution 
with (k — 1) degrees of freedom. For moderate values of n; , adjust- 
ments which transform Q so as to give a better approximation to x’ 
have been worked out by Cochran (1937) for n; all equal, and by 
James (1951). Welch (1951) transforms Q so that it may be referred 
to the F-table: his test and that of James are similar in that both neglect 
terms of order 1/n; . Although the range of applicability of these 
tests, which are all approximations, is not yet known, it is suggested 
that they be used down ton; = 6. To perform Welch’s test, we compute 


the auxiliary quantity 
k 1 ( 
a= 1— 
where w = > w;. Then 


Q ( 
2k — 2)a 
(k —1) + 
with degrees of freedom 
(k? — 1) 


= 


3a 


sal 


ones @ 


4 
( 
2 4 
| 
i 
k 
i 
q 


COMBINATION OF ESTIMATES 115 


The F and F,, tests will be illustrated by the data in table 2 for 
example 1. Although this example was analysed under the assumption 
that the experiments were of equal precision per observation, the 
probability value for Bartlett’s x’, 0.06, casts doubt on this assumption. 
The first step is to compute the quantities s; = s5,/f; : these are the 
estimated variances of the x; . The remainder of the calculations are 
arranged in table 7. 


TABLE 7 
CALCULATIONS FOR THE TEST OF INTERACTIONS 
xi 8? | wits | (1 — w/w)? ns | (1 — ws/w)?/n; 
86.4 | 1719 | .0582 | .05028 | .275 .526 10 .0526 
158.0 | 3348 | .0299 | .04724 | .141 .738 14 .0527 
11.3 807 | .1239 | .01400 | .584 .173 16 .0108 
5874 | .2120 | .11152 |1.000 a= .1161 
The F-test. 
5,381 


>; (z; — #” = 10,763 : 8 = 1,958: F = = 2.75 


1,958 
The degrees of freedom from an elementary point of view would be 2 
and 40. Formulas (11) and (12) will be found to give 1.7 and 30.3 
degrees of freedom, respectively. By interpolation between F(1, 30) 
and F(2, 30), the significance probability comes out at 0.09. 


The F.,-test. 
Q= wat — = 11.966 — (.11152)?/(.002120) = 6.10 
6.10 6.10 
Fe = ~ 2.058 ~ 7-96 
4 
= 230 


The significance probability is about 0.07. 

The test of interactions is of importance because, as pointed out in 
section 2, the presence of interactions affects our interpretation of the 
data and may determine the kind of mean that will be useful. In a 
borderline case, as in this example, the investigator should take into 
account both the significance probability and any other knowledge of 


= 
d 
ot 
d 
te 
| 


116 BIOMETRICS, MARCI 1954 
the data in deciding whether to regard interactions as present or absent. 
The conservative decision, when in doubt, is to assume interactions 
present, since the techniques for this situation remain valid even if 
interactions are absent. 

Experience in the application of the F- and F,,-tests indicates that 
although the F-test is more sensitive, the F-test, which is simpler to 
compute, is usually adequate for diagnostic purposes. Consequently, 
the working rules to be given later are based on the value of F. 


Interactions absent. 


If we are willing to assume that interactions are absent, one method 
of combination is to weight each x; inversely as its estimated variance 
s; , forming the weighted mean 


= w= Dw, 


The standard error of &,, is given approximately by a formula due to 
Meier (1953), with an adjustment by Cochran and Carroll (1953), 


s.e.(#,) = 4. 4, w)} (13) 


where 


If the n, are all equal, formula (13) reduces to the slightly simpler 


expression 
s.e.(Z,) = + (1 (14) 


The term inside the brackets is an adjustment which takes account 
of sampling errors in the weights 1/s; as estimates of the true weights 
1/o, , and also of the fact that the principal term inside the square 
root, 1/w, tends to be an underestimate of the corresponding population 
expression. These formulas require n; > 8: for values of n; below 8, 
see section 8. 

For the approximate number of degrees of freedom n, to be attached 
to this standard error, Meier (1953) suggests 


w 


n, 


(15) 


| 
< 
| 
{el 4(k — 2) 
if 
n= = 
|_| 


ler 


15) 


COMBINATION OF ESTIMATES 117 


If the n; are small, the sampling-errors in the weights may be large 
enough so that the weighted mean is no more precise than the unweighted 
mean Z, whose standard error is 


(16) 


The approximate number of degrees of freedom n! for this s.e. is 

4 

Si 

In seeking some rule which will help in deciding whether to use z 
or Z, , it is natural to try to base the rule on the value of Bartlett’s 
x’, since this will already have been calculated in many cases. Un- 
fortunately, the relation between this x° and the relative precision of 
= to #,, is not simple. When the degrees of freedom n; are large, x” can 
detect relatively small differences in precision which make @,, only 
slightly more precise than ¢. When the n; are small, on the other hand, 
x may sometimes be non-significant even when #, would be substan- 


tially better than @ As a rough guide to the relative precision R of 
= to &, , the following formula is suggested. 


(17) 


where 


n= 


This formula was derived as a mathematical approximation and has 
been checked on a number of sets of data. 

Since < is preferable on account of its simplicity unless Z,, brings a 
worthwhile gain in precision, the investigator will not go far wrong in 
using # unless FP is less than 0.9. My experience with actual data has 
been that often there is little to choose between ¢ and Z,, , but occa- 
sionally %, wins handsomely. 

A warning given by Yates and Cochran (1938) should be repeated. 
It sometimes happens that there is a correlation between x; and s* , 
for instance when experiments which have large responses also exhibit 
high variability. In this event a weighted mean gives too much weight 
to experiments where the response is low and will be biased, 

Example 2 illustrates the rule for choosing between # and Z,, . 


4 
1S 
if 
/ 
yy wee 
to 
nt 
its 
ire 
on 
ed 
ae 


118 BIOMETRICS, MARCH 1954 


Example 2. 


The data in table 8 are the responses in sugar per acre to an applica- 


TABLE 8 
RESULTS OF 4 EXPERIMENTS ON SUGAR-BEET 


Response to P 
(cwt) 

Xs ws = 1/s? 
+1.3 4.973 0.20 
+0.4 1.416 0.71 
+0.7 6.864 0.15 
+2.5 2.958 0.34 
Total 16.211 1.40 = w 


tion of superphosphate in 4 experiments on heavy loam soils in the 
1936 series of fertilizer trials on sugar-beet in England. Each experiment 
provided 15 degrees of freedom for error, giving n, = 60. 

The value of Bartlett’s x’ is 9.27, with a probability of about 0.03. 
In the test for interactions the value of F is less than 1. The response 
to P had also shown no sign of interactions in several other sets of these 
sugar-beet experiments, so that the assumption of negligible inter- 
actions appeared justifiable. 

By formula (18), the crude estimate of FR is 


The weighted mean is suggested. Its value is 


_ (1.3)(0.20) + (0. 4)(0.71) + (0.7)(0.15) + (2.5)(0.34) _ — 107 
1.40 


For the standard error, the simpler form in onan (14) can be 
used. 


4(k 2) _ _ 


(3) = 123 


Hence by equation (14), 


6822)\ 


bis 
| 
| 
«ft 
af 
: 
ig 
fe le 
wf 
‘an 
q 
| — | 


COMBINATION OF ESTIMATES 119 


From equation (15), the approximate number of degrees of freedom is 


w nw (15)(1.96) 
5 = = = 43 
0822 
nN; 


The unweighted mean may be verified to be 1.22 + 1.01. 

When the numbers of degrees of freedom in the individual experi- 
ments are less than 8, the weighted mean will seldom be more precise 
than the unweighted mean. With the weighted mean, one or two 
experiments tend to receive very large weights and almost determine 
the value of the overall mean. If Bartlett’s x’ is large, the investigator 
may still feel that some kind of weighting is desirable. A suggested 
procedure is partial weighting (Yates and Cochran, 1938). The same 
weight is given to all experiments with relatively low values of s; , 
this weight being #, = 1/3; , where §; is the mean of the s; over those 
experiments that are chosen to have equal weight. Each of the remain- 
ing experiments receives its individual weight w; = 1/s; . 

The choice of the number of experiments that are to receive equal 
weight is to some extent arbitrary. A good working rule is to give 
equal weight to between 1/2 and 2/3 of the experiments (Cochran, 


-1937). The method prevents an experiment which happens to have a 


small estimated error from dominating the result, while allowing the 
less precise experiments to receive lower weights. 


Example 3. 


In studies by the U. S. Public Health Service of observers’ abilities 
to count the number of flies which settle momentarily on a grill, each of 
7 observers was shown, for a brief period, grills with known numbers of 
flies impaled on them and asked to estimate the numbers. For a given 
grill, each observer made 5 independent estimates. The data in table 
9 are for a grill which actually contained 161 flies. Estimated variances 
are based on 4 degrees of freedom each. 

The value of Bartlett’s x” was 19.9, with 6 degrees of freedom and a 
significance probability of less than 0.01. Evidently the observers 
differ in precision. The F-value in the test for interactions was prac- 
tically 1, giving no indication of any differential bias in observers’ error. 

The only point of interest in estimating the overall mean is to test 
whether there is any consistent bias among observers in estimating the 
161 flies on the grill. Although inspection of table 9 suggests no such 
bias, the data will serve to illustrate the application of partial weighting. 

It is clear from table 9 that if weighting inversely as s; were employed, 


3 
3 
e 


ine 
{ 


120 BIOMETRICS, MARCH 1954 


TABLE 9 
OBSERVERS’ MEAN ESTIMATES AND ERROR VARIANCES 


Mean 

Observer estimate 8? Partial 

Xi weights 
1 183.2 117.0 .0129 
2 149.0 8.1 .0129 
3 154.0 235.9 .0042 
4 167.2 295.0 .0034 
5 187.2 1064.6 .0009 
6 158.0 51.2 .0129 
7 143.0 134.0 .0129 
w = .0601 


observer 2 would have great influence on the estimate. For partial 
weighting, we give the same weight to observers 1, 2, 6, and 7. Since 
8; is 77.6 for these observers, @, = .0129. The partial weights appear 
at the right of table 9. 


= (:0129)(183.2) + + (0129(143.0) _ 


For the standard error, let 
u = no. of experiments given individual weights = 3 


w, = total weight for these uw experiments = 0.0085 
p = no. of experiments given the same weight = 4 
vi, = average no. of d.f. for these p experiments = 4 


If 7, is less than 8, 
V pw, + rw, (19) 


where \ is read as a function of 7, and u from table 12 as described in 

section 8. In this example, with 7, = 4 and u = 3,A = 18. 

V (4)(.0129) + (1.8)(.0085) 
.0601 


For a, greater than 8, the standard error may be taken as approxi- 
mately 


s.e(Z,.) = 


s.e(#,.) = 


= 43 


s.c(Z,,) = + + 4, w(w, — w)} (20) 


c 
al 
| 
<a = 
7 
AV 
“pt 
Tow 


54 


ial 
ce 
ar 


in 


Xi- 


COMBINATION OF ESTIMATES 


where the >> is taken over the wu experiments only and 


4(u — 2) 


(wu — 1) 


Formulas (19) and (20) are revisions of an earlier formula, given by 
Yates and Cochran (1938), which assumed p and wu to be large. In 
this example, formula (20), although outside of the range of its applica- 
bility, agrees well with (19), giving a value of 4.5 for the standard error. 


Interactions present. 


In this case we again have the model 
=ut(us — +e; 


and the variance of x; is (0; + 0°). The choice of estimate lies between the 
unweighted mean and the sample semi-weighted mean > W,r;/>> W;, 
where 


where § is the mean of the s‘ . 

As explained in section 4, the relative precision of € and Z,,, depends 
on the size of o; and on the amount of variation among the o? . Use 
of € when F exceeds 4 is a safe working rule, unless there are extremely 
large variations in the precisions of the individual x; . 


Example 4. 


Example 1, previously discussed, represents a situation where the 
data do not indicate very clearly what kind of model and analysis are 
appropriate. The probability value for Bartlett’s x’, 0.06, made it 
doubtful whether equal precision per observation could be postulated. 
Although this assumption was adopted in the original analysis, tests for 
interactions without, making this assumption were carried out in see- 
tion 5. The F and F,, values gave probabilities of 0.09 and 0.07, raising 
the further question whether interactions should be considered as 
present or absent. Since, however, the value of / was 2.75, the more 


121 
+ 8; 
The quantity is computed by formula 

=\2 

(xz; — 2) _2 

= 
(k — 1) ba 
= 
20) 
Vi 


122 BIOMETRICS, MARCH 1954 
cautious procedure is to recognize that interactions may be present 
and use a semi-weighted mean. The subsidiary computations needed 
are given in table 10. 


TABLE 10 
COMPUTATIONS FOR THE SEMI-WEIGHTED MEAN 


Expt. 3? s2 +s? 10°; 
1 86.4 1,719 5,142 194 
2 158.0 3,348 6,771 ’ 148 
3 11.3 807 4,230 236 

Totals 255.7 5,874 578 


s, = 5,381 — 1,958 = 3,423 


The calculation proceeds in the right hand columns of table 10. 
We find 


10° 
V 578 


The estimate originally made in example 1 was 85.2 + 42.3. The 
two estimates do not agree very closely. The difference is due to an 
apparently fortuitous correlation between z; and s; . 

The remaining sections deal with the derivation of some of the appro- 
priate formulas. 


Z.. = 74.1 : 8.e.(Z,.) = = 41.6 


6. COMPARISON CF THE UNWEIGHTED AND SEMI-WEIGHTED MEANS 


Given that interactions are present, the variance of the unweighted 
mean is 


Va) = (21) 


If the semi-weights are known exactly, the variance of the semi- 
weighted mean is 


1 1 
VG.) = where = 
Owing to errors in the weights, the variance of the sample semi- 


weighted mean will be greater than (22). Hence the ratio of (21) to 
(22) gives an upper limit to the relative precision of Z,, to # This 


4 
| | 
| 
At 
4 4 
§ 
| 
i 
Wy 
| 
He 
iJ 
| 
| 
| 


COMBINATION OF ESTIMATIS 123 


ratio is 
If 
I = o,/é 


is the ratio of the interaction variance to the average error variance, 
(23) may be written 


(24) 


In the development of a working rule about the choice between < 
and Z,, , the first step is to find an upper limit to \ when we fix the 
two quantities J and the ratio r of the greatest to the smallest error 
variances. For the following argument, I am indebted to Dr. Paul 
Meier. 

Let o; be the smallest error variance, and let 


Then 


tr) = AR (oy) 


Hence (24) becomes 


(I +1) 1 
R 
_ R 


where A = I/k. 


The argument proceeds by showing that \ cannot have a maximum 
unless every 7; is at one of the ends of its possible range from 1 to r. 
Since J is fixed, we may neglect the term (J + 1)/k’ and consider the 
quantity 


(27) 


ig 

k 545 
=2 

R 
-} 


121 BIOMETRICS, MARCIE 1951 


It may be verified that 


ay \ R’ 
or, + rf (AR + 


where a prime denotes summation over all terms except that in 7, . 
Hence, at any point at which dy/dr, = 0, we have 


ay AA +) 


24+) {1 ___A__(AR +1 
(AR +7,) (A + 1) (AR + 7,) 

The term inside the curly brackets is easily seen to be positive for 
every t. Hence at any point where dy/dr, = 0, we have the second 
derivative positive, so that there is no interior maximum. 

To find the maximum value of \, let m of the r,; be 1, and the re- 
maining (k — m) ber. Then from (26) 
| RI + m k — m| 


* 


where now 
= m+(k— mr 


There is no convenient analytic expression for the maximizing value 
of m, but for given J and r the maximum is easily computed numerically. 
The results in table 11 show the reciprocal of the maximum, i.e. the 
lower bound to the relative precision of < to Z,, . 


TABLE 11 
LOWER LIMITS OF RELATIVE PRECISION OF z TO Zsw 


= largest / I = ratio of interaction variance to average error variance 
smallest error 


variance 0 1 2 3 


iv 
a 
it 
d 
a 
r 
ti 
I 
| 
| 2 97 99 99 
3 75 .93 97 .98 
4 | 64 95 97 
6 49 85 93 95 
8 40 81 90 94 
16 .22 74 86 91 | 
4 


COMBINATION OF ESTIMATES 125 


If an upper limit of 10 per cent in the loss of precision is regarded as 
tolerable, table Tl shows that the unweighted mean is satisfactory 
whenever / exceeds 3, or when J is at least 2 and ris 8 or less, or when 
T is at least 1 and ris 4 or less. Values of r greater than 16 were not 
included in the table, on the grounds that such cases would represent 
a very extreme degree of variation in the o; . 

The translation of these results into the working rules given in 
sections 4 and 5 can be made only approximately, since in practice we 
do not know the value of J. In section 5, the numerator of F is an 
unbiased estimate of (o; + 4°), while the denominator is an unbiased 
estimate of ¢°. Hence, F can be considered as an estimate of (1 + /), 
although the estimate may be shown to be positively biased. The 
rule given in section 5, namely to use @ in general when F' exceeds 4, 
was chosen because with Ff > 4, J is unlikely to be < 2, and from 
table 11 the unweighted mean suffers little loss for J = 2 unless the 
x; differ greatly in precision. 

Similarly, the /’-ratio in table 3 of section 4 is an estimate of 


If’ l 
If the range of values of f; is not too great, this expression is approxi- 
mately (1 + J), and leads to the rules given in table 4. 
These rules are perhaps biased in favor of the semi-weighted mean, 
because the figures in table 11 are underestimates of the relative pre- 
cision of & to . 


7. APPROXIMATE NUMBER OF DEGREES OF FREEDOM IN THE STANDARD ERROR 
OF 


In sections 4 and 5 the formula 


- 


Vk WV 


has been recommended for the standard error of & when interactions 
are present. Since the x2, do not have equal variances, this standard 
error is not distributed in the usual way for a root mean square with 
(k — 1) degrees of freedom. 

The distribution of s; will, however, be approximated by a distri- 
bution of the usual type for a mean square. The number of degrees of 
freedom n, ascribed to this distribution will be chosen so as to give it 
the correct variance. 


i! 
Sigg 


126 BIOMETRICS, MARCIE 19514 


If the x; are normally and independently distributed about yu, the 
variance of s; is found by algebra to be 


Visi) = 2{(k — 2)0. + /(k — 1)° (28) 


For the typical distribution of a mean square (i.e. that of a multiple of 
x with n, degrees of freedom), 


2 
V(s;) 2{E(s;) = = (29) 
ne Ne 
Hence, by equating (28) to (29), 
(30) 


(k 2)02 + 0; 


In practice, We must substitute the sample estimates of 0, and 0, . 


8. THE STANDARD ERROR OF THE WEIGHTED MEAN WHEN THE ny ARE SMALL 


Meier’s formula (13) or (14) in section 5 is satisfactory for values of 
n; down to 8 or down to 6 when k is small. For very small values of 
n, , the value of the factor in curly brackets in (13) and (14) has been 
estimated by experimental sampling, Cochran and Carroll, (1953). 
Values taken from this paper appear in table 12. 


TABLE 12 
VALUES OF \ FOR WHICH A/w IS AN ESTIMATE OF V(éw) 


Number of Experiments 


2 
o 
lor) 
—) 
bn 
_ 
o 


2 2.0] 2.9) 3.9] 5.1] 6.1 | 7.9 | 10.6 | 12.6 | 17.1 | 22.8 


The use of this formula is illustrated in section 9. 


| 
1 
| | Let 
‘ 
al 
“aes 
| | | | | | | 


COMBINATION OF ESTIMATES 127 


9 STANDARD ERROR OF THE PARTIALLY WEIGHTED MEAN 


Let #, be the mean of the p experiments which each receive equal 
weight #, , and Z, be the weighted mean of the remaining wu experiments 
which receive individual weights. For the estimated variance of Z, 
we take the average observed variance in these experiments, divided 
by p; or in other words, 


pw, 


For the estimated variance of #, , we use either Meier’s formula or, 
if the n, are less than 8, the empirical formula in the previous section, 


where is read from table 12 and w, is the total weight for these u 
experiments. 


The overall mean is 


= PW, Xp + 


pw w 


Hence, if «@, and w, can be regarded as free from error, 


V + 


Ww 


V pid, + dw, 
w 

as given in section 5. , 

This argument is non-rigorous in several ways. The variance given 
for Z, is too low, since these p experiments are selected because they 
appear to be precise. Similarly, the variance for Z, is too high. Also, 
errors in the relative weights, pw, and w, , are ignored. However, the 
formula does reduce to the appropriate values when p = k and when 
p = 0: in intermediate cases my guess is that it may be slightly too low. 


10. SUMMARY 


This paper discusses methods for combining a number of estimates 
x; of some quantity u, made in different experiments. For the ith 
estimate we have an unbiased estimate s; of its variance, based on 
n, degrees of freedom. 


It is important to find out whether the z; agree with one another 


: 

| 
= — 
u 
Wu 
| 

: 
; 


| 

| 

ds 
ae 


128 BIOMETRICS, MARCH 1954 


within the limits of their experimental errors. If they do not, i.e. if 
interactions are present, the type of overall mean that will be useful 
for future action requires careful consideration. However, in most 
cases the problem of estimating the mean of the z, , at least over some 
subgroup of the experiments, will remain. 

If the experiments are of the same type and the z; are of equal 
precision, the best estimate in general is the unweighted mean Z, but 
its standard error differs according as interactions are present or absent. 

The second case considered is that in which the experiments are of 
different types, but the variance o} per observation is the same in all 
experiments. The variance of zx, is then of the form o;/f; . If there 
are no interactions, the best combined estimate is the weighted mean 
> fix:/d fi . If interactions exist, the choice lies between the un- 
weighted mean and a semi-weighted mean. Recommendations for 
this choice are given. In the semi-weighted mean, the weights W; are 
ideally inversely proportional to 


2 


fi 

The semi-weighted mean reduces to the weighted mean when the inter- 
action variance a, = 0, and to the unweighted mean when the interaction 
variance is large. In practice, sample estimates of the weights are used. 

Experiments in which the variance per observation is not constant 
represent perhaps the most common case in practice. In the absence 
of interactions, possible estimates are the unweighted mean, weighting 
inversely as s* or, if the n; are small, a kind of partial weighting. When 
interactions are present, the unweighted mean or the semi-weighted 
mean is appropriate. Working rules are given to aid in the selection 
of an estimate. 

In conclusion, the unweighted mean will probably be satisfactory 
with many sets of data. The principal value in learning about various 
types of more complex estimates lies in occasional situations in which 
the unweighted mean would incur a substantial loss of precision, and 
also in receiving assurance that the unweighted mean is often entirely 
adequate. 

Some approximations in theory that are needed for the practical 
recommendations are developed in later sections of the paper. 


2 
o, + 


REFERENCES 


Bancroft, T. A. (1944). On biases in estimation due to the use of preliminary tests of 
significance. Ann. Math. Stat., 15, 190-204. 

Bliss, C. I. (1952). The statistics of bioassay. Academic Press Inc., New York. 
p. 576. 


| 
| 
q 
4 
4 


COMBINATION OF ESTIMATES 129 


Cochran, W. G. (1937). Problems arising in the analysis of a series of similar experi- 
ments. Jour. Roy. Stat. Soc., Supp., 4, 102-118. 

Cochran, W. G. and Carroll, S. P. (1953). A sampling investigation of the efficiency 
of weighting inversely as the estimated variance. Biometrics, 9, 447-459. 

Cochran, W. G. and Cox, G. M. (1950). Experimental designs. John Wiley and Sons, 
Inc., New York. Chapter 14. 

James, G. S. (1951). The comparison of several groups of observations when the 
ratios of the population variances are unknown. Biometrika, 38, 324-329. 

Kempthorne, O. (1952). The design and analysis of experiments. John Wiley and 
Sons, Inc., New York. Chapter 28. 

Meier, P. (1953). Variance of a weighted mean. Biometrics, 9, 59-73. 

Paull, A. E. (1950). Ona preliminary test for pooling mean squares in the analysis of 
variance. Ann. Math. Stat., 21, 539-556. 

Satterthwaite, F. E. (1946). An approximate distribution of estimates of variance 
components. Biom. Bull., 2, 110. 

Welch, B. L. (1951). On the comparison of several mean values—an alternative 
approach. Biometrika, 38, 330-336. 

Whitlock, J. H. and Bliss, C. I. (1943). A bioassay technique for anthelmintics. 
Jour. Parisitology, 29, 48-58. 

Yates, F. and Cochran, W. G. (1938). The analysis of groups of experiments. Jour. 
Agr. Sci., 28, 556-580. 


fica 
Move 
ane 


THE ANALYSIS OF VARIANCE WITH VARIOUS BINOMIAL 
TRANSFORMATIONS 


Proressor Sir RoNALD FISHER 
Department of Genetics, Cambridge 


1. Introductory. 


Much experimental data are in the form, sometimes termed quantal, 
in which out of n independent trials a particular response, e.g. the 
death of an experimental animal, or the growth of a microbial culture, 
is observed a times, and in which such pairs of values n and a have 
been obtained under a variety of conditions, the variation being perhaps 
to some extent deliberately imposed, as by a variation of dosage, and 
to some extent out of experimental control, as is the variation in response 
of different batches of material. 

In such cases it is usually desirable to interpret each pair of values 
as supplying information about a variate functionally connected with 
the probability of which a/n is an empirical estimate, and such that it 
is, so far as possible, additive in respect of the effects of varying condi- 
tions, and linear in deliberately imposed measures such as dosage, 
when these are given an appropriate metric. 

Examples of such transformations of the probability, which have 
been widely used are 


= du (1) 
V 
where z, or, for computational convenience sometimes x + 5, is termed 
the Probit; 


p = sin’ ¢ q = cos ¢ (2) 
where ¢ runs from 0 to 90°, 
or, sometimes using the double angle, 


2p = 1 — cos 0 2q = 1+ cos 0 


where ¢, or @ is spoken of as an Angular transformation; the logistic 


130 


| ; 
i 
‘ng 
= 
| 
| 
13 


BINOMIAL TRANSFORMATIONS 131 


transformation 


z = 3 log (p/q) (3) 
where z, or sometimes 2z, has been termed the Logit; the log log trans- 
ormation, 


x = log log (1/p), (4) 


and a variety of others appropriate to special situations, such as that 
devised for the interpretation of gene ratios in a situation involving 
diffusion and selection in equilibrium, defined by the differential equation 


= Apge (5) 


with the “boundary” conditions 
p=%3, 

defining a function x of p which I have termed a Legit (7). 

No doubt many other such transformations will be developed for 
special purposes. The five mentioned have, however, all been the sub- 
ject of mathematical investigation, and the necessary numerical tables 
have been supplied for their convenient use. 
2. The maximum likelihood procedure. 


The practical procedure of fitting by Maximum Likelihood as a 
means of obtaining a correct analysis of variance is an application of 
the general method of efficient scores. The probability of what has 
been observed in any set of n trials is 


n! 


alin a)! 
and the log likelihood, so far as concerns the unknown 9, is 
a log p + (n — a) log q; 


then its rate of change for variation of the transformed variate, say 2, is 


(2 n— a) dp 
p q dx’ 


which is the efficient score with respect to x for this set of n observations. 
We notice that the n trials may be scored individually, with scoring 


I? 


132 BIOMETRICS, MARCI 1954 


coefficients 


p de 


for the events with expected frequency p, called “successes”, and 


for the alternative events, or “failures”. The mean of such scores is 


zero, for 
1 de) (! ap) 
dx dx) ~ 


and their mean square, the Amount of Information is 


1 (1 
q dx/’ 


pq \dx 
This is the amount of information about z for a single trial; for the set 
of n trials it comes naturally to a value n times as great. 


Corresponding with any set of observations we may now construct 
a variate 


so 


by dividing the score by the amount of information on which it is 
based, which gives the linear adjustment required by the observations 
to any proposed value x. The variance of this variate will be exactly 
1/ni, when a takes the binomial distribution 


(p+ q)’; 


so in further analysis the variate will be given the weight ni.. Parameters 
such as regression coefficients, class differences, etc., fitted by using 
such variates with their proper weights will necessarily satisfy the 
conditions of maximal likelihood. For example, if the transformed 
values, x, are believed to be linearly related to some observable, {, 


| “4 
: 
1dp 
q dx 
| 
= 
Ne 
(: n— dp _ (a »p) 
ni \p q /dz 
dx 
| 
: 
| 


BINOMIAL TRANSFORMATIONS 133 


with an equation 


in Which 4, B are to be adjusted to suit the data, we may note that 

6B 

so that A, B will take the pair of values of maximal likelihood, when 


the sum of the scores, and the sum of the products of each score by the 
corresponding ¢, are both zero. In other words, when 


>, wy = 0, > wyt = 0 


where y is the working deviation from the expected value z. 

Using as weights the reciprocals of the exact variances of the variate 
in each set, all residual sums of squares are x” values of the appropriate 
degrees of freedom, and as such are available to test any questionable 
aspect of the hypothetical formulation on which the analysis is based. 


3. Practical apparatus. 

For each type of transformation used, we need only tabulate against 
x, 
the maximum working value 


/(®) 


the minimum working value 


/(®) 


the weighting. coefficient 


the maximum and minimum working probits are 


2 


(dey 
pq 
So for probits, if z stands for dp/dz, epee. 


134 BIOMETRICS, MARCH 1951 


the range is 


and the amount of information is 


2? 


Again, if instead of p being given explicitly in terms of z, x is given 
in terms of p, as for example, 


—x = log (—log p), 


then 

de 

dp plogp’ 
and so, 

dp _ 

dx Pp log 


Py 2 


and the maximum and minimum working variates are 


q 1 


remembering that log p is always negative. 

Tabular apparatus for (1) Probits and (2) angular values have been 
published in Statistical Tables, (8) where are also given the formulae, 
which need no special tabulation, for (3) Logits. For (5) Legits, tables 
in similar form have been published more recently (7) in Biometrics. 
I do not know that the values for (4) given by the formulae above 
have been tabulated, but in any case they are not difficult to calculate 
using only a table of natural logarithms. If u is log dose taken to the 
base e, of a living infective agent, and if each living particle is supposed 
to have an equal and independent chance of establishing an infection, 
then the regression of x on wu will be linear with regression coefficient 
equal to unity. The experimental verification of this equality is pro 
tanto confirmation of the theory that this probability is independent 
of the subject, as in another way is the x’ value of the sum of squares of 
deviations from the fitted line. 


} 
vd 
41 3 
14 
| 
4 
ti 
| 
| 
i 4 
| 
4, 
| 


BINOMIAL TRANSFORMATIONS 


4. “Corrections” to angular and square root transformations. 


Just as in the limit for large sample size and small probability the 
binomial distribution, 


(q+ p)’; 
tends to the Poisson limit with 
m = pn, 
so the angular transformation 
p =sin’ ¢ 
becomes 


ra Vm 


where x from different populations is proportional to the original 
angular measure g. The working variate is then 


a-—m_a + ™m 
2Vm 2Vm’ 
where a is the observed and m the expected frequency; the weighting 
coefficient is a constant, 4. 

It should be emphasized that if an analysis of variance is in view 
the choice of what transformation to use should be governed by the 
prospective additiveness of the transformed variate when various con- 
trollable, or uncontrollable factors, the effects of which are to be analysed, 
are varied. In the author’s view, conformity with this and other 
presuppositions of the method chosen, after accurate fitting, can be 
satisfactorily confirmed by a series of x” tests. Significant heterogeneity 
appearing at this stage in spite of exact analysis gives serious grounds 
for doubting the appropriateness either of the transformation, or of 
the adequacy of control of the experimental material. 

In the use of the angular and square root transformations the near 
constancy of the variances due to purely sampling error of the trans- 
formed variates has exercised a certain fascination, and has sometimes 
seemed to be the reason for choosing this type of transformation. 

The fact is that the amount of information about ¢, 


pq \de 


9 
135 
> 
1 uy 


136 BIOMETRICS, MARCIL 1954 


reduces, When p is equated to sin® g, to 


1 


2 sin ¢ cos ¢)” 
sin? ¢ cos" I, 


and is exactly constant for all values of g. The variance of the working 
angle is therefore absolutely constant. 

Similarly, the amount of information about Vm, where m is the 
parameter of the Poisson series is 


g/m)? 
m 


| 
> 


and the variance of the working value 


a-~m_a+tm 
2Vm 


is exactly 1/4, for all values of m. 

For a normal distribution the amount of information is the inverse 
of the variance; for other distributions this reciprocal equivalence does 
not hold, and the constancy of the amount of information supplied 
about / m does not imply that the variance of the distribution of Va, 
where a is a Poisson variate will be constant. Such a connection is 
only to be looked for in large samples where the Poisson distribution 
approaches the normal. 

In spite of the exact constancy of the amount of information which 
should, I think, have served as a warning, certain authors (a) thinking 
that the approximate constancy of the variance of a was the object 
of the transformation, and (b) observing that such constancy is im- 
perfect, have suggested various troublesome modifications, which have 
now been available for some time. Thus in 1936 Professor M. 
Bartlett (2) in a paper entitled The square root transformation and the 
analysis of variance, proposed the use of Wa + 1/2, when @ is an 
observed variate, and therefore a sufficient estimate of the parameter 
m of a Poisson Series, and pointed out that its sampling variance was 
somewhat more constant than that of Va. Twelve years later, F. J. 
Anscombe in Biometrika (6) indicated that Va + 3/8 was even 
better, at least when m is large, a result he ascribed to A. H. L. Johnson. 
Neither author seemed to realise that in the correct process of using 
these transformations, as set out above, the variance of each working 
value is exactly equal to 1/7, and needs no adjustment. 

It is difficult to judge just what influenced Bartlett in putting forward 
his proposal for adjusting the Poisson variate. In his 1936 paper he 


fc 
ce 
a 
re 
| 
ti 
t! 
n 
a 
ti 
| 0 
1 
| I 
r 
| 
‘ 


BINOMIAL TRANSFORMATIONS 137 


compares the addition of 1/2 to the variate before taking the square 
root, to Yates’s correction for continuity, which Yates had introduced 
for tests of significance with x’ having one degree of freedom, but the 
comparison is very tenuous. Yates had the exact test of significance 
at the time, and could demonstrate empirically that his adjustment, 
(a) was easy to apply, and (b) did in fact improve the test of significance. 
Bartlett does not attempt to show that an improved analysis of variance 
results from his adjustment. In the more general case of the angular 
transformation he suggests 


sin’ V(a + 4)/n 


the signs being determined by whether a is less or greater than n/2. 
This awkward form was replaced by Anscombe (1948), by the more 
rational proposal to use 


sin? V(a + + 9), 


which is at least consistent over the whole range of observations. 
In seeking a transformation having constant variance, Bartlett 


may also have been influenced by the transformation of the correlation 
coefficient, 


r = tanh z, 


which I had shown in 1921 (1) to give distributions for z sufficiently 
nearly normal for the use of the Gaussian distribution in tests of sig- 
nificance and sufficiently constant in variance for the (unknown) true 
correlation to be an unimportant factor in such tests. These advantages 
are such that for practical purposes, tabulation of the exact distribution 
has been entirely unnecessary, but to suppose that there are correspond- 
ing advantages in attempting to make the variance of some function 
of a Poisson variate as constant as possible, suggests that the non- 
normal character of this discontinuous distribution has been ignored, 
and even that it was proposed to use the empirical transforms as variates 
in the final analysis. 

The unsuitability of using empirical transforms was early made 
clear in the case of Probits, where experiments in which all of the tests 
react alike would empirically be given infinite variates with zero weights. 
The exact treatment was given in 1935 in an Appendix to C. I. Bliss (9). 
The full table for obtaining the correet working values was given by 
Bliss (3) in 1938 The determination of the dosage mortality curve from 
small numbers. In the same year Fisher and Yates (8) Tables for 
Statisticians, gave corresponding tables both for the Probit, and for 


dt 
See 
: 


138 BIOMETRICS, MARCI 1951 


the angular transformation, in which the need for the use of a correct 
working variate had not been foreed on the notice of statisticians by 
the appearance of infinite values. 

In 1940 W. G. Cochran (4) considered Bartlett’s adjustment in a 
paper in the Annals of Mathematical Statistics. He refers to the 
method exhibited in the preface of Tables for Statisticians, and for which 
provision had been made in tables XII and XIV, and evidently recog- 
nises this as more correct than the use of empirical angles. Yet he 
seems to assume that the latter may be used without inaccuracy save 
in special and particularly in terminal cases, for which however he 
mentions and does not totally disavow the proposal (11) to substitute 
1/4 for 0 and n — 1/4 for n. It is a great pity that Cochran in this 
paper does not clearly point out that such adjustments have no useful 
function, at least finally, if it is intended to perform a correct analysis. 
The subsequent papers (5, 6) by Bartlett (1947) and Anscombe (1948), 
show no such consciousness of the situation as they would have obtained 
had Cochran expressed himself more definitely. 

Arising from the idea that the empirical transforms can be used in 
the final analysis, instead of being always of a tentative and provisional 
character, is the emphatically advocated notion that great differences in 
computational effort are required in the use of different transformations. 
In particular the logistic transformation has been advocated by J. 
Berkson, as though this were a major consideration, and in a recent 
review of Finney’s Probit Analysis (J.A.S.A. 47, 687), KK. A. Brown- 
lee (10) repeats Berkson’s extraordinary claim that the logistic curve 
can be fitted thirty times as rapidly as the normal. I have fitted many 
cases of both over the last fifteen years, and there is little to choose 
between the two procedures, in cases that require careful fitting, i.e. 
when the different test batches are broken up in small groups, as must 
often happen when many factors are brought into the analysis. It is 
true that the working Logits, and their precision are given by simple 
formulae which need no special tabulation beyond the use of a readily 
available table of hyperbolic tangents, but the work of calculating each 
value is not less or appreciably different from that of looking up the 
values appropriate to other transformations, in the tables already 
available. In no case are the computatiuns unduly onerous, and they 
are as likely to be,lengthy with logits as with the other transforms, if the 
number of classes is large. To choose one transformation rather than 
another on the supposition that the labour will be less, without regard 
to its conformity with theoretical considerations, seems to be a very 
mistaken policy, seeing that the estimates, which are always an intrinsic 
part of the analysis of variance, are in such cases estimates only of 


4 

ar 
| 
i 
4 
“di 
4 
a 
| 
Hes 
| 
| 
| 
} 
at” 


BINOMIAL TRANSFORMATIONS 139 


mathematical artifacts. The appropriateness of our choice is, however, 
open to confirmation by the x’ test, and where this test shows the 
angular transformation to have been usually successful in like material, 
we may gain some real computational advantage by assigning in advance 
equal or proportional weights to the different entries in a two-way or 
three-way analysis. 


BIBLIOGRAPHY 


(1) R. A. Fisher (1921). On the “probable error” of a correlation coefficient de- 
duced from a small sample. Metron I, 4, 1-82. 
(2) M.S. Bartlett (1936). The square root transformation in the analysis of vari- 
ance. J. Roy. Stat. Soc., Suppl., 3.1, 68-78. 
(3) C. 1. Bliss (1938). The determination of the dosage mortality curve from small 
numbers. Q. J. of Pharm. and Pharmacol., 11, 192-216. 
(4) W. G. Cochran (1940). The analysis of variance when experimental errors 
follow the Poisson or Binomial laws. Annals Math. Stat., 40, 335-347. 
(5) M.S. Bartlett (1947). The use of transformations. Biometrics, 3, 39-52. 
(6) F. J. Anscombe (1948). The transformation of Poisson, Binomial and negative- 
binomial data. Biometrika 35, 246-254. 
R. A. Fisher (1950). Gene frequencies in a cline determined by selection and 
diffusion. Biometrics, 6, 353-361. 
R. A. Fisher and F. Yates (1938). Statistical Tables for Biological, Agricultural 
and Medical Research. Oliver and Boyd. 
(9) C. I. Bliss (1935). The calculation of the dosage-mortality curve. Appendix by 
R. A. Fisher. Ann. Applied Biol., 22, 134-167. 
(10) K. A. Brownlee (1952). Review of Finney’s Probit Analysis. J.A.S.A., 47, 687. 
(11) C. Hisenhart (1947). Selected techniques of statistical analysis. Columbia Univ., 
New York. 


(7 


~ 


(8 


i 
: 
i 
5 


a 


DISCUSSION OF 
THE ANALYSIS OF VARIANCE WITH VARIOUS BINOMIAL 
TRANSFORMATIONS 


BY Str FISHER 


A COMMENT ON THE USE OF THE SQUARE-ROOT AND ANGULAR TRANSFOR- 
mations. By M.S. Bartlett. 


After a careful reading of Sir Ronald Fisher’s stimulating paper: 
“The analysis of variance with various binomial transformations”, my 
impression is that our respective viewpoints on the use of the square-root 
and angular transformations are less inconsistent than Fisher himself 
at times seems to imply. A very clear explanation of when the exact 
analyses to which Fisher refers are justifiable is already available in the 
admirable paper by Cochran (4)*, so that little further comment need 
be made. However, the reader might note that my own references 
((2), (5), and J. Roy. Statist. Soc. (Suppl.) 4 (1937) p. 168) to the use of 
these two transformations have always been in connection with field 
and laboratory experiments, of a routine type, where heterogeneity above 
the basic Poisson or binomial variation was present. In such cases an 
empirical analysis on a scale for which the variance was stabilized was, 
in spite of Fisher’s remarks, a great advantage, provided the use of such 
a scale did not clash with the other requirements of an efficient analysis. 
This situation is quite distinct from the situation considered by Fisher, 
where the theoretical specification of the data is assumed so completely 
known that a full likelihood analysis is feasible. The presence of hetero- 
geneity of a less specified type may be unwelcome, and in laboratory 
experimentation may be a warning of lack of control (ef. my remarks in 
my 1937 paper, loc. cit.), but in many field trials at least this may be no 
fault of the experimenter. 

Poisson and binomial variation was discussed in my first paper (2) as 
a limiting case of this more general situation where such heterogeneity 
was envisaged; in this limiting ease some justification of the use of 
Vix + 3) in preference to Vx on efficiency grounds, as well as on 
variance stability, was given there. In the ease of the sin”! Wr trans- 


*Numbered references correspond to the bibliography at the end of Fisher's paper. 


140 


1 
7 
if 
| 
| 
| 
1 


DISCUSSION 141 


formation, the awkward corresponding adjustment referred to by Fisher 
was later (1937, loc. cit.) amended to the use of } for 0 (and n — } for 
n in the case of 100%). Fisher himself refers for this adjustment to 
(11), where a much more detailed discussion of its properties will indeed 
be found.* Incidentally, the use of { for 0 could of course be used in 
place of V(x +3) or V(x + 3) for Poisson data; the over-all gain 
over Vx in efficiency and variance stability is comparable, though the 
last transformation, as would be expected from Anscombe’s work (6), is 
still somewhat superior for variance stability. It might, however, be 
noticed that if the minimum efficiency for homogeneous data of about 
96% with these various empirical adjustments is not to be further re- 
duced by real differences between block means, the bias in the mean 
should either be zero or at least remain constant on the transformed scale 
as the true mean m varies. The variable V(x + 2) appears slightly less 
satisfactory in this respect for very small m. 

In my own experience of actual data where such transformations 
might be useful, I have always found that heterogeneity prohibited any 
exact analysis of the type recommended by Fisher. In Cochran’s paper 
two examples were given where the presence of heterogeneity was not 
so clear-cut as to rule out the possible value of such an analysis; but even 
then its justification was still questionable, and the use of the more 
empirical but simpler adjustments referred to above might be considered 
adequate. Moreover, even if the data are otherwise homogeneous, the 
occasions when the square root or sin7' Vz seale is a suitable one for 
additivity of treatment effects does not appear to me common enough 
for the use of such scales to be justified very often on these grounds 
rather than on variance stability. Thus what practical value these two 
transformations may have has always seemed to me to rest on a more 
empirical basis than, for example, the probit or the logarithmic trans- 
formation. 


Comments. By F..J. Anscombe 


The technique described by Sir Ronald Fisher relates to the estima- 
tion by maximum likelihood of the unknown parameters in a response 
law. The technique involves the transformation of the observations 
into “working values”, which are functions not only of the observations 
but of the maximum-likelihood estimates of the unknown parameters; 
and since one of the objects of making the transformation is to find those 
maximum-likelihood estimates, successive approximation is generally 
necessary. 


*Cf. also “The angular transformation in quantal analysis” by P. J. Claringbold, J. D. Biggers 
and C. W. Emmens, 9 (1953), 467. 


a 
t 
d 
e 
3, 
Oo 
of 
S- 


142 BIOMETRICS, MARCH 1951 


The square-root and angular transformations studied by Professor 
M.S. Bartlett are simple functions of the observations only. They do 
not depend on estimates of unknown parameters, and once calculated 
do not have to be recaleulated. Such transformations may be used in 
two sorts of situation: (1) when there is no question of estimating a 
response law, and (2) when a response law is to be estimated and a 
quicker method than Fisher’s is required. 

The following is an example of the first sort of situation. The eel- 
worm population of a field is estimated by taking a soil sample (bulked 
from numerous borings), mixing it and extracting four small equal quan- 
tities of soil, in which the eelworm cysts are counted by a flotation 
method. If mixing and counting were perfect, we should expect the 
counts made from any one field to follow a Poisson distribution. But if 
the mixing is bad we may expect more than Poisson variation in the 
counts, while if counting is bad we may get less than Poisson variation. 
It is therefore of some interest to test the variability of the counts within 
fields. The counts may look like this: 


First field: 3, 0, 4, 3 
Second field: 7, 7, 7, 
Third field: 4, 
Fourth field: 12, 15, 13, 19 
Fifth field: 0, 1, 4, 2- 


~I 


And so on. The fields have different infestations, and there is no ques- 
tion of a “response law” relating infestation with the number of the field. 

One possible procedure of analysis is to calculate for each field the 
sum of squares of the counts about the mean, and divide by the mean 
count. This has approximately the x’ distribution with three degrees 
of freedom, and exact expressions for the first few moments are available 
(Haldane, 1937), under the assumption that the variation in the counts 
follows the Poisson distribution. Another possible procedure of analysis 
is to transform the counts so that the population variance becomes a 
known quantity independent of the mean if the Poisson assumption is 
true (square-root transformation), and then to calculate for each field 
the sum of squares of the transformed counts about the mean, divided 
by the theoretical variance of the transformed counts. This also has 
approximately the x° distribution with three degrees of freedom under 
the Poisson assumption, and the exact moments can be studied numeri- 
cally (Anscombe, 1948). These two alternative tests have roughly equal 
power, and the x* approximation for the distribution of the criterion is 


al 
1 ch 
by 
ey 
vi 
V 
m 
to 
m 
al 
al 
cc 
de 
th 
= 
: m 
al 
Si 
it 
p 
7 B 
B 
V4 
| 
" 
I 
a 
( 
a 


n 


DISCUSSION 143 


about equally accurate. Tt seems to me, therefore, that one may safely 
choose whichever test is the more convenient. The technique described 
by Fisher does not seem to me to have any relevance here. 

As an example of the second sort of situation, consider any toxicity 
experiment in which equal numbers of insects (n, say) are exposed to 
various doses of an insecticide, the number r killed being noted each time. 
Various alternative response laws have been proposed for such experi- 
ments. For example, there are the normal-integral law which gives rise 
to the method of probits, and the logistic law which gives rise to the 
method of logits, and the trigonometric law which gives rise to the 
angular transformation and associated adjustments tabulated in Fisher 
and Yates’ Tables (XIT-XIV). So far as I know, no one has given 
convincing experimental proof that any one law holds accurately and is 
definitely superior to the others. Armitage and Allen (1950) found that 
the probit and logit methods sometimes gave better fits than the angle 
method; but the evidence they considered is not very large in quantity, 
and is possibly untypical, having been taken from published studies. 
Since probably none of these laws is much more correct than any other, 
it seems reasonable to choose the one that can be fitted most easily. One 
possible claimant as the easiest law to fit is the logistic, as fitted by 
Berkson (1953). Another such law is one for which an appropriate 
method of fitting involves the angular transformation as described by 
Bartlett and slightly modified by me. If we transform from r to y, where 


+ | 


we suppose the response law to be such that the expected value of y is 
exactly linear in the dose (or log-dose) «x, that is, say, 


E(y) = a + Bx. 


If p is the probability that any one insect is killed at dose x, we have 


approximately 
2 + — }) 
E(y) = sin”! 4(2p | (1) 
n+ 4 
so that the response law is approximately 
1— 2 
pt+ = sin’ (a@ + Bx). (2) 


(We assume this to hold except where p is very close to 0 or 1.) This 
seems just as plausible as Fisher’s assumption 


p = sin’ (a + Bx); (3) 


- 
‘ 
f 
| 
y=sin 
n 
le 
is 
is 
ig 
is 
1S 


144 BIOMETRICS, MARCI 1951 


and it is much easier to fit, beeause we merely perform an ordinary 
unweighted regression of yon. to estimate o and 8, and we only do this 
once—there is no successive approximation, as there is for Fisher's 
response law with maximum-likelihood fitting. 

It may be objected that the law (2) involves n, which ought not to 
enter into a “true” response law. To this I would reply that no-one 
knows what the true law is, and by suitable choice of a and 8 the law (2) 
could be made to fit about as well as anything else. However, if the 
presence of n in (2) is considered objectionable (as indeed it would be if n 
were not the same for each dose), the transformation can be changed to 


y = sin’ + 


al 


for which the variance is slightly less well stabilized, but which is appro- 
priate for Fisher’s law (3), since we now have, in place of (1), the 
approximate relation 


E(y) = sin™' Vp. 


REFERENCES 
Anscombe, F. J. (1948). Biometrika, 35, 246-254. 
Armitage, P. and Allen, I. (1950). J. Hygiene, 48, 298-322. 
Berkson, J. (1953). J. Amer. Statist. Assoc., 48, 565-599. 
Fisher, R. A. (1954). Biometrics, 10, 180-140. 
Haldane, J. B. S. (1937). Biometrika, 29, 133-148. 


THE RELATION BETWEEN SIMPLE TRANSFORMS AND MAXIMUM LIKELI- 
HOOD SOLUTIONS. By W. G. Cochran. 


Sir Ronald Fisher gives an elegant and compact presentation of the 
application of maximum likelihood estimation to analysis of variance 
problems that involve binomial data. Since my 1940 paper has been 
referred to, I can best contribute to the discussion by summarizing the 
argument in that paper, and commenting on it in retrospect. I should 
perhaps remark here and now that my 1940 paper was not at all concerned 
with adjusted transforms such as Va + 3, except for a reference to an 
empirical rule suggested by Bartlett for the angular transformation. 
The adjective ‘‘adjusted”’ is used in my paper, but it denoted the maxi- 
mum likelihood sciutions. 

At the time of my 1940 paper, the square root and angular trans- 
formation had becn advocated as devices for applying the analysis of 
variance to Poisson and binomial data respectively. The object of my 


ba 
a 
1 
t 
i 
1 
ie ‘ 
t 
te 
I 
‘ 
I 
] 
] 
‘ 
| 
4 


he 


DISCUSSION 145 


paper, as stated, was “to discuss the theoretical basis for these trans- 
formations in more detail, and in particular to examine their relation 
to a more exact analysis.” 

With data of the Poisson type, I first gave the general form of the 
maximum likelihood equations, but expressed the opinion that the 
numerical solution would be too complicated for frequent use because of 
the relatively large number of unknown parameters which usually enter 
into the specification of an analysis of variance problem. 

I showed, however, that if the effects of treatments, blocks, ete., are 
additive on a square-root scale, the maximum likelihood analysis can be 
performed in a series of successive approximations, by means of the 
method based on working square roots which Fisher exhibits in section 
4 of his paper. The approximate x’ test of goodness of fit was also 
outlined and a numerical example was worked, using the square roots of 
the original data as the first step in the approximation. 

The same procedure was then developed for binomial data in which 
effects are linear on the angular scale. The analysis in terms of working 
angles, as given in section 4 of Fisher’s paper, was illustrated by a 
numerical example. 

In short, with Poisson or binomial data, the square root or angular 
transformation, respectively can be regarded as the first step in a 
maximum likelihood analysis that is valid when effects are additive in 
the transformed scale. 

In the final sections of the paper, the practical utility of the iterative 
process versus that of approximate analysis in the square root or angular 
scales was discussed. I pointed out that the iterative processes lead to 
maximum likelihood solutions only if (i) effects are additive in the trans- 
formed scale and (ii) the residual variation is solely of the Poisson or 
binomial type. If either of these conditions fails, the iterative process 
loses its claim to be an exact solution. This warning is relevant because 
there seems no @ priori reason for expecting that effects will be additive 
in the square root or angular scale, although the assumption of additivity 
may be a satisfactory approximation. A more important restriction on 
the assumptions, in my own experience, is that biological data which 
appear to be of the Poisson or binomial type often contain additional 
variation. The x’ test is, of course, a safeguard against inappropriate 
use of the iterative process. 

From experience with several sets of data that had been analyzed by 
both methods in preparation for the paper, I also pointed out that the 
results of the approximate analysis in the square root or angular scale 
seemed to agree well with the analysis based on the final stage of the 
iterative process, except for observations giving zero or very small values 


is 
Lo 
he 
to 
‘O- 
LI- 
the 
ace 
pen 
the 
red 
an 
on. 
UXI- 
ns- 
of 
my 


146 BIOMETRICS, MARCH 1954 


in the Poisson case, or zero or 100 per cent in the binomial case. These 
considerations led me to the following final summary statement, “In 
practice, these new methods are not recommended to supplant the simple 
transformations for general use, because it can seldom be assumed that 
the whole of the experimental error variation follows the Poisson or 
binomial laws. The more exact analysis may, however, be useful (i) for 
cases in which the plot yields are very small integers or the ratios of 
very small integers (ii) in showing how to give proper weight to an 
occasional zero plot yield.” 

In retrospect, I am inclined to agree with Fisher that these conclu- 
sions are too confident, because they were based on the analysis of only 
a small number of examples by both methods. I hope that one result of 
Fisher’s paper will be to stimulate the building-up of a substantial body 
of evidence on the frequency with which the assumptions required in 
the maximum likelihood analysis hold in practice and on the adequacy 
of analyses of the unadjusted square roots or angles. My own experience 
with such data since 1940 is still limited, but as far as it goes it has not 
led me to modify the conclusions. In the majority of examples that have 
come my way, the residual variance was considerably greater than the 
binomial or Poisson variance, so that the iterative procedure in my 1940 
paper did not apply. In the remainder, apart from the exceptions noted 
in my conclusion, analysis of the unadjusted transforms gave results 
close to the maximum likelihood solutions. 

Some further information on this point is supplied by a recent paper 
(Claringbold, Biggers and Emmens, Biometrics, 9, 475, 1953) which 
compares the angular transformation, using Bartlett’s empirical adjust- 
ment for zero responses, with two cycles of the maximum likelihood solu- 
tion. Data from 6 factorial experiments were analyzed by both methods. 
Summarizing these comparisons, the authors remark that the difference 
in results is “noticeable, although negligible for practical purposes.” 
They conclude that analysis of the angles “gives a very good basis for 
the maximum likelihood process and in many practical situations may be 
sufficiently accurate in itself—the decision, however, will rest with the 
investigator.” 

To summarize, an analysis of variance of the simple square roots or 
angles may be useful in two types of situation. The first concerns 
heterogeneous data in which the residual variance is greater than the 
binomial or Poisson variance, but in which we do not feel able to set up 
a mathematical specification as to the nature of this residual variation. 

A maximum likelihood solution, which requires this specification, cannot 
then be made. In using a transformed scale, the hope is that it will bring 
the data sufficiently close to additivity of effects, normality and equality 


aA 3% gab te 


he 
| : 
hig 
“4 
4 
| 
ay | 
| 
| 
| 
bps. 
int 
| 


DISCUSSION 147 


of variances so that an ordinary analysis of variance in the transformed 
scale will be reasonably efficient. The second situation, as already men- 
tioned, is one in which the analysis of square roots or angles is regarded 
as a quick approximation to the maximum likelihood solution, with a 
saving of time at the expense of some loss of efficiency. 

Fisher’s attitude, if I do not mistake it, is that this saving in time 
is often overestimated, and is a poor return for having to use a method 
that is empirical in place of one that is well-grounded in theory. In this 
connection, I think that some distinction is appropriate between uses 
made by the professional statistician and uses made by the biologist. 
The professional statistician might be expected to give strong preference 
to a method of analysis for which there is a good theoretical foundation 
and to find that differences in computing times between approximate 
and well-grounded analyses are seldom great enough to be a major 
factor. The biologist, for understandable reasons, is much attracted by 
quick and simple computational methods, and an additional step or two 
that is trivial to the professional statistician can appear formidable to 
the biologist. It is my impression that the speed with which a new 
technique becomes widely used is considerably influenced by the sim- 
plicity or otherwise of the calculations that it requires. Next door to 
the lecture room in which the probit method is expounded one may still 
find the laboratory in which the workers compute their LD 50’s by the 
Behrens (Reed-Muench) method. And even the professional statistician 
is likely to look kindly on methods that may save some time when there 
are large amounts of routine data to be processed, or when he is in a 
hurry. Moreover, the element of personal taste seems to enter. We 
tend, I think, to make exaggerated claims for the computing methods 
which we happen to like, at the expense of those which we don’t like. 

The argument in favor of time-saving methods can, of course, be 
carried too far and can lead to opportunistic and second-rate teaching. 
But the argument has enough validity to justify the current widespread 
interest in speedy, if somewhat inefficient, methods of analysis. 

With regard to adjusted transforms, I have made some use of the 
transforms log (c + 1) and Vx + 3 in my own work, again on the 
empirical grounds that these scales seemed to provide a slightly better 
discrimination of the treatment effects than the corresponding simple 
transforms. I agree, however, that emphasis on variance-stabilizing as 
an end in itself would be misplaced. 


Comments. By Joseph Berkson. 


I consented to comment on the remarks of Sir Ronald Fisher only 
with considerable reluctance. The passages of his article that have to 


r 
n 
y 
e 
yt 

1e 
10 
er 

ch 
st- 
lu- 
Is. 
ice 
” 
for 

the 
or 
rns 
the ee 
up 
not 
‘ing ie 
lity 


148 BIOMETRICS, MARCH 1954 


do with my work are so far out of the bounds of reasonableness or rele- 
vaney that on first reading them I could only believe that he had been 
misinformed regarding my statements.* On this basis I wrote to him, 
pointing out the possibility that he had misunderstood what I had said 
and asking him whether he might wish to withdraw his criticisms and so 
avoid controversy. His answer was in the very affirmative negative. 

Sir Ronald refers to ‘“Berkson’s extraordinary claim that the logistic 
curve can be fitted thirty times as rapidly as the normal” and then says, 
“T have fitted many cases of both over the last fifteen years, and there is 
little to choose between them.’’ The reader who is not familiar with the 
vagaries of Fisher’s controversial writing is not likely to surmise that 
I did not say what he said I said. I did not make the general statement 
he attributes to me; the statement was, ‘““The computers report that they 
can accomplish thirty definitive minimum logit X° solutions by the 
method I have advanced (1) (6) in the time required to accomplish one 
definitive maximum likelihood probit solution (4).” It need not be 
doubted that Fisher has, as he says, fitted many cases of the logistic 
function, but it is very questionable whether he has computed even one 
minimum logit X° estimate, which is what I was explicitly talking about, 
as I am bound to say he must have been well aware, even when he com- 
posed his manuscript, but certainly after he received my letter, for that 
is what I respectfully pointed out to him. 

My statement was based on an experiment, fairly carefully per- 
formed, involving a large series of representative samples from a study of 
the comparative statistical characteristics of maximum likelihood and 
minimum X* estimates, in which the estimates were calculated to 5 
significant figures, correct within +5 in the last figure, which is the 
minimum required for measuring the distinction between these estimates. 
On seeing Sir Ronald’s challenge of its veracity, I decided to make a 
check, and there has been time for only a couple of examples. The first 
was taken from the tables of Fisher and Yates (9). The computer was 
instructed to calculate the probit maximum likelihood estimate by the 
method described in that volume, correct to 4 significant figures, which 
is the number given by the authors, and to keep a careful account of the 
time required. Preliminarily some time was wasted, in futilely trying 
to use the tables of the volume in whic! the example is incorporated, 
before deciding, as did Garwood (10) for the same example, that other 
tables are necessary; we used the W. P. A. (12) tables. Four iterations 


*He did not include in his bibliography a reference to any article of mine. 
**Iterations are continued unt] da and Ob are lessthan | 5/|in the (s + 1)th figure, where s is the 
number of significant figures carried in the final estimates, so that the error in the sth figure of the 
estimate as given should be less than | 1 |. 


ad 
| 
| 
| 
| 
| 
yack 
i| | 
| 
| 
|| after the provisional solution were necessary,** yielding b = .7128, 
4 
|! 
| 


DISCUSSION 149 
X50 = 6.609, in fair agreement with the final value given by Fisher and 
Yates.* The time required was 4 hours and 55 minutes. The minimum 
logit X° estimate was calculated in 12 minutes, and yielded X,,. = 6.637. 

The second example was from Mather’s text (11), in which two itera- 
tions are accomplished from which the estimates as finally given are 
b = 8.2824, LD;, = 0.3660. We started with the author’s provisional 
estimates, and required four iterations to attain the estimates precise to 
4 decimal places, yielding b = 8.2889,** LD,. = 0.3660. The time re- 
quired was 4 hours 20 minutes, while the minimum logit X* estimate 
needed only 12 minutes to complete and yielded LD;, = 0.3651. 

The ratio of the time required by the two methods is 21 rather than 
30, but this is only for two particular examples. Actually the ratio 30 
does not adequately reflect the relative computational difficulty of the 
noniterative minimum logit X” estimate and the maximum likelihood 
estimate obtained by iterative procedures accomplished with the linear 
transform. In the investigation referred to above there were 1331 3-does 
samples, for which it was necessary to accomplish maximum likelihood 
estimates by iterative procedures, and we used the linear transform 
method. We found that there were samples that did not converge to 
any finite estimate, even with the use of the device advanced by Fisher 
in the Appendix to Bliss’s paper (7), and this took much time to discover. 
It is remarkable that Fisher did not note in any of his voluminous writ- 
ings on maximum likelihood and on the basis of his experience with fitting 
the probit and logit lines that there were samples that did not yield a 
finite estimate. In an experiment recently set up on the basis of a sug- 
gestion made to me by D. J. Finney (in quite another connection), in 
which there were 3 equally spaced doses, 10 at each dose, with the true 
P’s at the successive doses, 10, 50 and 90 per cent respectively, we found 
that 12 per cent of the samples had an infinite estimate by maximum 
likelihood. For 4 equally spaced doses, with the true P at the lowest and 
highest dose 1 and 99 per cent respectively, more than 20 per cent of the 
samples have an infinite estimate by maximum likelihood. Fisher’s 
device of using a ‘‘working value” to replace the “empirical” value of 


*Pisher and Yates obtained b = .7126. Our value, .7128, agrees with that of Garwood; a more 
precise value calculated in this laboratory is b = 0.7128321. 

**So that the estimate of b as finally given by Mather is really incorrect. Here, as is the case with 
most other published probit estimates, the author did not realize that more iterations are required than 
performed, to yield estimates with the stated precision. Actually it is doubtful that more iterations 
could have been correctly carried out with the tables used. Practically never have I been able to check 
the value of the slope constant b, even in authoritative texts devoted wholly or in part to the probit 
method, to within 10 of the last figure given—-a situation attributable directly to the difficulties of the 
iterative procedure. Some of the published statements to the effect that the difficulties of the probit 
maximum likelihood method have been exaggerated are supported with illustrations of estimates that 
are incorrectly calculated. The difficulties of calculation result in inaccuracy and therefore, counter to 
what Sir Ronald says, are of central importance. 


: 


150 BIOMETRICS, MARCH 1954 


the transform is thus seen not to be a completely satisfactory solution 
for the case of zero survivors. ‘Then there were other samples which, 
though they could be shown to have finite estimates, did not appear to 
converge, even with very many cycles of iteration, and we had to turn 
to other methods. 

I hope no animus will be attributed to the following question: In 
schools where the method of probit analysis is taught, are the statistical 
responsibilities implied in the use of iterative procedures, and the neces- 
sity frequently to perform many iterative cycles, carefully pointed out? 
If what is taught is the widely promulgated procedure of a single cycle 
of iteration based on a provisional line graphically estimated, then I do 
not believe that it is warranted to assume that the estimate has greater 
efficiency (smaller variance) than some easier method such as the 
Karber, or the method of Behrens-Reed-Muench. As Cornfield and 
Mantel have pointed out (8), failure to attain a precise maximum likeli- 
hood estimate because of insufficient iteration results in an estimate with 
larger variance. 

Sir Ronald Fisher must know that I have not discussed the minimum 
logit X° estimate only in respect to its computational simplicity. One of 
my articles (4) was devoted almost exclusively to a discussion of theo- 
retical considerations, and stressed caution in the physical interpretation 
of the parameters of the function used. As respects statistical charac- 
teristics, I showed in my first paper (6) that the minimum logit X° 
estimate practically always yielded a smaller X* than the maximum 
likelihood probit estimate. In a series of subsequent reports, the most 
recent of which was presented at the Third International Biometric 
Conference (2) (3) (5), I have presented other salubrious statistical 
properties of the estimate. The logit function has sufficient statistics 
and the minimum logit X’ estimate is sufficient, while the probit function 
does not have sufficient statistics and the maximum likelihood probit 
estimate is not sufficient. For the logit function, asymptotically, the 
minimum logit X° estimate has the same properties as the maximum 
likelihood estimate, while for finite samples it has a smaller variance and 
a smaller mean square error than the maximum likelihood estimate. 


REFERENCES 


1. Berkson, Joseph, “‘A Statistically precise and relatively simple method of esti- 
mating the bio-assay with quantal response, based on the logistic function,” 
Journal of the American Statistical Association, 48 (1953), 565-599. This reference 
was not in the original quotation. 

2. Berkson, Joseph, ‘Minimum chi-square and maximum likelihood estimates of 

regression coefficients,’ Third International Biometric Conference, Bellagio, 

Italy, September 4, 1953. 


2 
a 
bs 
eae 
A 
4 
| 
« 


DISCUSSION 


3. 


12. 


151 


Berkson, Joseph, “Relative precision of least squares and maximum likelihood 
estimates of regression coefficients,” (Abstract), Annals of Mathematical Statistics, 
23 (1952), 148. 


. Berkson, Joseph, “Why I prefer logits to probits,’’ Biometrics, 7 (1951), 327-339. 
. Berkson, Joseph, ‘Relative precision of minimum chi-square and maximum 


likelihood estimates of regression coefficients,” Proceedings of the Second Berkeley 
Symposium on Mathematical Statistics and Probability, Berkeley and Los Angeles; 
Univ. of Calif. Press, 1951. 471-79. 


. Berkson, Joseph, ‘Application of the logistic function to bio-assay,” Journal of 


the American Statistical Association, 39 (1944), 357-65. 


. Bliss, C. I., “The calculation of the dosage-mortality curve,’’ Appendix by R. A. 


Fisher, Annals of Applied Biology, 22 (1935), 134-67. 


. Cornfield, J., and Mantel, N., “Some new aspects of the application of maximum 


likelihood to the calculation of the dosage response curve,” Journal of the American 
Statistical Association, 45 (1950), 181-217. 


. Fisher, R. A., and Yates, F., Statistical Tables for Biological, Agricultural and 


Medical Research. Second Edition. London and Edinburgh: Oliver and Boyd 
(1948). 


. Garwood, F., “The application of maximum likelihood to dosage-mortality 


curves,” Biometrika, 32 (1941), 46-58. 

Mather, K., foreword by R. A. Fisher, Statistical Analysis in Biology. New York 
City. Interscience Publishers, Inc., Second Edition, 1946. 

Tables of Probability Functions, Vol. II. Federal Works Agency, Work Projects 
Administration for the City of New York, 1942. Arnold N. Lowan, Technical 
Director. 


= 


SUGGESTED, DESK CALCULATOR OPERATIONS 
FOR COMPUTING MOMENTS BY THE ROW 


F. M. Pu.D. 
Associate Professor 
School of Public Health 
University of Michigan 


Those who compute moments of grouped frequency distributions 
on desk model calculators may find interest in the following simple 
feed-back design of computation. 

There are two general types of approaches to getting the required 
sums: f: f(U); ete.; f(U,)" and some selected sum for 
Charlier’s Check (C.C.), e.g. > f(U, — 1)".. One approach is to compute 
SU), cumulatively and enter only the 
sums. This approach usually makes use of tables of powers of n which 
the computer feeds into the calculator, performing, say, >> f(U;)*. 
Tn this process, the value of each f is entered into the desk calculator 
(n + 1) times the number of intervals (C.C. accounts for +1). This 
method of solution by columns saves making individual entries and 
requires no summation. On the other hand, failure to get C.C. means 
recalculation of column(s). A second approach is solution and entry 
of each f(U,)" independently, then summing the respective columns. 
The method described below takes the latter approach and involves 
solution of f(U,)” by rows, requiring no use of tables of powers and 
requiring entry of f into the calculator only once for each class interval. 
When recalculations are required, entries can be checked with great 
rapidity. 

The second approach is recommended as the digits in f, class inter- 
vals, and moments to be computed are increased. If only >> f(U;) 
and fK (”,)* are desired, these usually should be solved directly without 
individual tabular entry of products. Also, caleulations through the 
2nd moment are usually easier to recompute than to eontrol by C.C. 

The proposed row solution design may excell the columnar approach 
for continuous routine solution of the first 3 or 4 moments by lessening 


152 


| 
I 
a 
Wie 
Re 
“4 
i! 
1 


DESK CALCULATOR OPERATIONS 153 


fatigue, offering increased speed in the re-check process (when indicated) 
and eliminating tables of powers and repetitious entering of the digits 
of f. 

There follows an example and explanation of the steps of operating a 
proposed feed-back system for solution by rows of individual f(U,)", 
wherein, n = 1, 2, 3 and 4. 


EXAMPLE 

f | | | | fUs)* | f(Ui 1) 
~9 23 — 207 | 1,863 |— 16,767 | 150,903 | 230,000 
=8 109 — 872 | 6,976 |— 55,808 | 446,464 | 715,149 
-7 355 —2,485 | 17,395 |—121,765 | 852,355 |1,454,080 


(The pattern having been established, the process may be continued. The following 
sums are drawn merely to demonstrate that the process is following C.C. for the 3 
rows. ) 


| | | 
| | 


SUMS | 487 | —3,564 | 26,234 |—194,340 |1,449,722 
-4ALT = 2,399 , 299 


There follows the proposed 5-step cycle for computations in each 
row of the example. 


Step 1. Enter 9 in Left Side of Keyboard (L.S.1KX.B.) and 23 in Right 
Side of Keyboard (R.S.K.B.); lock K.B. (optional); multiply by 
(U,; — 1)‘ and enter the product from R.S. Long Dial (23 & 10,000 = 
230,000 in the example); clear the dials leaving K.B. intact (locked at 
computer’s discretion). Multiply by 9 observing (U;)* = 81 in LS. 
long dial (L.S.L.D.) and f(U,;) = 207 in R.S.L.D. Enter —207 in 
tabulation (see example). 


(Note that all instructions to “clear dials’? may be ignored by sub- 
stituting “work over” procedures to develop the desired multiplicand 
on hand and semi-automatic models.) 


Step 2. Clear dials and multiply by 81, (U;)’ from Step 1; record 
f(U,)’ from R.S.L.D. which in the example is 1,863. Note that (U,)° 
has been generated at L.S.L.D. as 729. 


Step 3. Clear dials and multiply by 729, (U,)* from Step 2; reeord 
f(U,)* as — 16,767 from 


Step 4. Clear dials and multiply by 6,561, (U’ yh from Step 3; record 
150,903 from R.S.L.D. 


154 BIOMETRICS, MARCH 1954 


Step 5. Change keyboard to contain, from the subsequent row, 
U,; = 8 and f = 109; multiply by the (U,)* from Step 3 which remained 
in the short dial at end of Step 4, i.e., in the example 6,561 and record 
f(U; — 1)‘ for row of U; = —8 from R.S.L.D. (715,149 in the example). 


The feed-back cycle is established and the computer repeats Steps 
1-4 over a constant K.B., developing (U,)"** and f(U,)” at each step, 
changing K.B. for the 5th step and using the readily available (U,)‘ 
from the short dial of Step 4 multiplied by the f of subsequent row to 
produce f(U; — 1)* of the subsequent row. The selected example 
shows computations through 3 rows, and the corresponding Charlier 
check. 


Comments: 


Tables of powers may be built as (U;)"** on desk calculators simul- 
taneously with operations of form f(U;)". The design presented 
promises less error and fatigue in extensive calculations of moments on 
desk calculators than the use of a separate table of powers to be manipu- 
lated by the computer. Elements of the design may suggest program- 
ming steps for calculating punches such as IBM 602A and perhaps 
other continuous or repetitious calculating processes. Steps for using 
moments in developing descriptive constants of distributions are given 
by many authors, for example, 


REFERENCE 


Kenney, J. F.: Mathematics of Statistics, Vol. I, Chapter IV, D. Van Nostrand 
Company, Inc., New York, 1947. 


Tey 
il! 

| 
( 
i 


QUERIES 


GeorGEe W. Snepecor, Editor 


QUERY: Two experiments on the recovery time from carbon 
106 dioxide anaethesia were carried out on two sets each of fourteen 
milkweed bugs. The first experiment consisted of two 4 X 7 
Youden squares in which the columns were the individual bugs, rows 
their order of treatment, and letters the length of exposure. The 
second experiment, with the same designation for rows, columns, and 
letters, represented two 7 X 7 Latin squares. 
The data have been analyzed in terms of log-seconds of response, 
leading to the following mean squares among bug totals and within 
bugs, 


Experiment 1 Experiment 2 
DF MS DF MS 
Among bugs 13 | .028688 = 6? + 462 | 13 | .044063 = 6? + 78 
Within bugs 33 .011154 = 6? 70 .013472 = 6 
Bug component, 63 .004384 .004370 
Limits for 02 .00099 to.01271 00164 to .01189 


after segregating the effects of treatment. 

My question is: how do I weight my two estimates of the variance 
components (63) in combining the two experiments. The random 
variance component within bugs, I assume, is ¢; = .012729 with 103 
degrees of freedom, obtained by adding the sums of squares and dividing 
by the total degrees of freedom. The values for the bug component 
é; computed separately agree so well that there is no question of their 
similarity but their fiducial limits at P = .95 differ appreciably. How 
should they be weighted in computing their mean? 


ANSWER: The problem, in a slightly more general form, is as follows. 
"We have 3 independent estimates of variance 
v3 ™~ + k3o3 df. = N3 .028688 -+ ns = 13 
+ = .044063 ~ a + 702 = 13 
155 


| 
ite 
, 
1 


| 


156 BIOMETRICS, MARCIE 1951 


and we wish to estimate o; . Suppose that we take the weighted mean 
uv, + (1 wr, 
Its expected value is 
o, + + — 
It follows that an unbiased estimate of 03 is 
+ — wy, {hw + w)} (1) 


Assuming normality in the data, the variance of the estimate of a3 is 


+ kyo)” — + 
we + w) (or + 4. / + — 
1 


Ns Ny 
The value of w which minimizes this variance is given by the equation 


+ ki)? + | + hyp)? (key by) 


y4) 
Ns Ny 


Ny ny 
where @ = . 

Since the best weight depends on the unknown value of ¢, a method 
of successive approximation is suggested. Start with some value of w 
that seems reasonable. Use expression (1) to make a preliminary 
estimate of c3 and hence of ¢. Substitute this value of ¢ in equation 
(2) to determine a second approximation to w and hence a second 
estimate of o2 , which I believe will be adequate. 

With data like those in the numerical example I would normally 
start with a trial value of w = 0.5, but this turns out to be so near the 
final answer that w = 0 will be used for illustrating the procedure. 
Thus we start with the two equations 


0.044063 = + 76 
0.012729 = 


giving 
6; = 0.00H76, = = 0.352 
Then from equation (2) we have 


Th 


13 


|7(2.408)° + 13 103 


Which gives @ = 0.5 and from expression (1), é: = 0.00430. 
The essence of this method is that it uses a linear combination of v, 
and v,, Where the relative weight is chosen so as to minimize the variance 


QU! 


of 1 
to 1 
for 


obt 
cru 
goo 
met 
deg 


I fi 
Th 
(th 


use 


Th 
abc 


= 
We 
| 
| 
| 
| 
i? 
| 
10 
i = 
| 
Th 
Re 
cor 
the 


QUERIES 157 


of the resulting unbiased estimate of o; . This method does not lead 
to the maximum likelihood estimate of o2 , but it should be adequate 
for practical purposes. 

The next question is likely to be: how are fiducial limits for o3 
obtained? I hesitate to attempt an answer, because I can suggest only a 
crude approximation, and it appears to be difficult to investigate how 
good or bad the approximation is. First compute by Satterthwaite’s 


method (this Journal, Vol. 2, p. 110, 1946) an approximate number of 
degrees of freedom to be ascribed to the weighted mean square 


vv, + (1 — 


I find the d.f. to be 24.9, and for reasons of laziness would use 24 d.f. 
Then appropriate fiducial limits for o; are determined by Bross’ method 
(this Journal, Vol. 6, p. 136, 1950) from the two relations 


036376 ~ of + 5.503: df. = 24 
v, = .012727 ~ a} : df. = 103 


W. G. CocHran 


QUERY: This query concerns the application of Sheppard’s 
107 correction to the variance of a frequency distribution. The 
notation employed in Snedecor: ‘Statistical Methods” will be 
used. The code numbers X differ by unity. 
The formula for the unadjusted variance is 


= r(’ 
The adjusted variance is obtained by deducting /°/12 from the variance 
above, so that the formula becomes 


n— 12 


This method is also adopted by Fisher in “Statistical Methods for 
Research Workers” Ch. III p. 13. 

I maintain that this is the correct procedure. 

A celebrated biometrician has pointed out, however, that Sheppard’s 


correction should be applied to the second moment about the mean of 
the distribution, viz. to 


(SfX)* 


a 
ate 

| 
j 
: 

: 


158 BIOMETRICS, MARCH 1954 


(c.f. Aitken “Statistical Mathematics” p. 44 et seq.) 
The formula thus becomes:— 


n 12) n 


According to this line of reasoning, the second moment is a biased 
estimate of variance, and an unbiased estimate is obtained by mul- 
tiplying this by n/(n — 1), the final formula being 


n—l 


The estimate given by (a) exceeds that given by (b) by J?/12(n — 1), 
a trivial amount if n is large. 

Seeing that Sheppard’s correction is in itself an approximation, is 
there any difference between the methods? If so, which is the correct 
one to adopt? 

ANSWER: It should first be pointed out that the relation between 
the exact moment », and the grouped moment @, is 


Ho = — 12 (1) 

This relation is itself approximate, but if the probability density 
function is sufficiently well-behaved (cf. Cramer, Mathematical Methods 
of Statistics, p. 362) the approximation is so good as to be considered 
exact for all practical purposes. Now in order to compare your two 
methods of estimating », and to consider them on the same grounds, 
let us assume the relation (1) is exact. Let us write 
Sfx? — (Sfzx)’/n 


and 


Sfx? — (Sfx)’/n 
n 


and denote your first and second estimates of y. by f., and fiz. respec- 
tively. Then 


22 


Q 
Li 
| H 
| wl 
| 
if 
bi 
wi 
ay ak 
| bu 
|| 
" Ba = Ba 12 
i r| n 
= 
12Jn—1 


C- 


QUERIES 159 
Let FE denote the operation of taking the expectation. Then 


2! 


Il 
= 
nN 

3 
| 
| 
| 


Hence, if (1) is exactly true, then f,, is an unbiased estimate of py, , 
whereas fz. is biased, and the amount of bias is actually 


12(n — 1) 


On these grounds, therefore, the estimate f., is to be preferred. Even 
if n were large, it is conceivable that J might also be so large that the 
bias which results in using f2. could be appreciable. Therefore, one 
would always be on the safe side by using f., . If, however, the prob- 
ability density does not behave well enough to consider (1) as being 
exact, then one would need more specific information about the distri- 
bution to know which of the estimators f., , 22 is preferable. 


JoHN GURLAND 


2 
2 
| 
, 


ABSTRACTS 


Papers presented at the Third International Biometric Conference, 
Bellagio, Italy, Sept. 1-5, 1953 


L. MARTIN. (Faculté de Médecine de l’ Université de Bruxelles 
Institut Agronomique de l’Etat 4 Gembloux) Enseignement des 
239 Principes d’Experimentation et des Methodes Statistiques a des 
Biologistes dans Deux Etablissements Belges d’Enseignement 


Superieur. 
4 
u 1—Faculté de Médecine de Bruxelles. 
1. Depuis 1948, les étudiants suivent en lére année, un cours obliga- y 


toire de Math. Générales (30 h. + 25 h. travaux pratiques). 
2. Il existe: 21. Un cours obligatoire de statistique générale (Prof. 


R. Olbrechts) de 10 h. pour les médecins hygiénistes, 22. Un cours ¢ 
facultatif d’éléments de Statistique appliquée aux études de labo- ¢ 
ad rutoire et aux recherches cliniques. (Prof. L. Olbrechts et L. s 
| : Martin) de 15 h. Suivi par les étudiants, médecins et pharmaciens Q 
; faisant de la recherche en laboratoire et en clinique. t 
y 3. But des cours, Intérét de planifier les recherches avant le départ. 
Pas de travaux pratiques. 
| : 5. Syllabus de 22. I 
1 1. Population et échantillon. 2. Réduction des données. 1 
7 3. Construction opérationelle de fonctions de répartition et de H 
distribution de plus en plus compliquées. 4. Loi de Gauss. ( 


5. Ses intégrales. 6. N.E.D. et Probits. 7. Loi binomiale. 
P 8. Loi de Poisson. 9. Test sur 2 séries indépendantes. 10. Id. s 

par paires. 11. Analyse de la variance simple. 12. Test X°. I 
\ 13-14-15. Plans d’expérimentation en biologie et essais cli- 


as 


niques. 
6. Difficultés au point 3 du syllabus avec les auditeurs sans formation Z 

mathématique. 

2—IJnstitut Agronomique de U Etat Gemobloue. 

1. Les Etudiants suivent en lére année les éléments de Géométrie 

Analytique et d’analyse infinitésimale: en 2e année les éléments 

de caleul des Probabilité, la statistique mathématique et les lois ¢ 

biométriques. Le cours de Principes d’Expérimentation et Bio- s 


métrie est donnée en 4e année. 


160 


4 
4 
A 
A 
4 
4 
am 
| 
| 
| 
‘ 


ABSTRACTS 161 


2. Le cours Prine. d’Expérimentation et Biométrie comporte 15h. 

3. But. Intérét des plans dexpérience avant la recherche. 

4. existe 15 h. de travaux pratiques. 

5. Syllabus. en gros, matiére contenue dans le Design of Ixperiment 
de Cochran et Cox. jusque la p. 122. 

5’. Essai de Cours Post-Graduate. Statistique Math. (15 h. Teghem) 
Principes d’expérimentation. (30 h. Martin). Essais Biolo- 
giques. (30 h. Martin). Reégles pratiques d’expérimentation 
(7 h. par 7 spécialistes). 

6. Difficultés vu la carence absolue en algébre linéaire. 


240 G. BARBENSI. (Firenze). L’Insegamento della Biometria. 


Definita la biometria in base ai concetti oggi generalmente accettati, 
considerato il contributo che al suo sviluppo viene portato dai biologi, 
dai matematici e dagli statistici, si considerano le varie fasi che quello 
sviluppo ha avuto in Italia, per concludere con un quadro della situazione 
attuale, soprattutto per quanto riguarda l’insegnamento di quella ma- 
teria nei nostri Istituti universitari. 

Considerata l’inadeguatezza di questo insegnamento, se ne analizzano 
le cause, le quali, mentre in parte sono da ricercarsi nella deficienza di 
provvedimenti di legge per l’istituzione di insegnamenti adeguati alle 
necessita, in parte dipendono dalle difficolta che gli studenti incontrano 
nel seguirli, difficoltA che sono essenzialmente da attribuirsi alla man- 
canza dei necessari fondamenti matematici. 

Viene posto in evidenza Vinteresse dimostrato dai biometri italiani 
sia nella prima Riunione Italiana della Biometric Society a Milano nel 
1951, sia nella Riunione a Firenze di quest’anno, nella quale si procedette 
alla costituzione della Regione Italiana della Biometric Society, per 
Vinsegnamento della biometria, interesse che si concretd nella formula- 
zione di due inviti al Ministro della Pubblica Istruzione perché attuasse i 
richiesti insegnamenti. 

Viene esposto un programma di studi per studenti delle Facolta 
biologiche (Scienze Naturali, Scienze biologiche, Scienze agrarie, Medi- 
cina, Medicina veterinaria, Farmacia), tenendo conto del diverso grado 
di cultura matematica propedeutica e si considera l’opportunita che i 
corsi siano fondamentali (obbligatorio) o complenentari (facoltativi) a 
seconda dei casi. 

Viene quindi esaminata l’opportunita di rendere obbligatorio il Corso 
di biometria per alcune categorie di laureati, perfezionandi od assistenti. 


> 
} 
le 
e 
| 

1e 
ts 
Is 

O- 
\ ' 


| 162 BIOMETRICS, MARCI 1951 
Si esamina il problema della ereazione di docenti in numero adeguiato 
per concludere che, messi sulla via della realizzazione, anche Ja nostra cl 
Istruzione Superiore, potrd annoverare la biometria come materia di ty 
insegnamento nelle Facolta biologiche, con quella estensione e sviluppo a 
inerenti alla importanza da essa acquistata nello studio dei problemi 
biologici. 
a 
C. I. BLISS. (Yale University). A Course in Biometry for ci 
241 
Graduate Students in Biology. f 
el 
Experience gained in teaching biometry at Yale University to gradu- 
ate students majoring in different branches of biology is reviewed. le 
Originally two courses were offered in alternate years or terms, differing sa 
partly in statistical content and approach but primarily in illustrative de 
examples, one for pharmacologists and bacteriologists and the other for m 
botanists and zoologists. Later these were combined into a single class 
with the addition of a large proportion of forestry majors. The class has m 
averaged about nine students, consisted of two or three one-hour lectures pa 
a week for one or two semesters and required from 12 to 15 laboratory tir 
hours per week in the solution of numerical examples. The course is or 
introductory, involves a minimum of mathematics, and considers the 
logic of experimental design and statistical analysis with the objective mi 
of enabling potential biological investigators to design and evaluate their m: 
own research more effectively and intelligently. A detailed outline has or 
been developed through the years and serves as the main text material. pr 
How well the course has accomplished its aims and how it might be im- po 
proved are considered in the light of a poll of all students who have taken do 
the course. éti 
au 
A. VESSEREAU. (Institut de Statistique, Université de Paris). ; 
242 Enseignement des Méthodes Statistiques Appliquees a la Bio- on 
metrie. ne 
sal 
Ce n’est qu’aprés la derniére guerre qu’une action coordonnée a été me 
entreprise pour diffuser et appliquer les méthodes statistiques dans Ve 
différents domaines recouverts par le terme général de “Biométrie’’. wa 
Un enseignement de ces méthodes a été créé en 1946 a l'Institut de “8 
Statistique de l’Université de Paris; il a été suivi dés le début, et continue A 
a étre suivi par les étudiants qui se destinent 4 la recherche agronomique re 
dans les territoires de la France d’Outre-Mer, ainsi que par des cher- — 
de 


| ay 
i 
| 
\, 
q 
: 


Oo 


ABSTRACTS 163 


cheurs déja affectés A des Centres de Recherches. Quelques années plus 
tard, Institut de Statistique a également créé un enseignement de 
“Genétique des populations” 

Il y a quelques années, Institut’ National Agronomique a inscrit 
Venseignement des méthodes statistiques au programme de sa troisiéme 
année d’étude (éléves se destinant 4 des carriéres de recherches: généti- 
ciens, pédologues, etc. . .); des compleménts de mathématique, et en 
particulier des notions de calcul des probabilités preparatoires 4 cet 
enseignement sont données au cours des deux premiéres années d’étude. 

Dans le cadre de la Faculté des Sciences, les étudiants qui préparent 
le certificat de “génétique” recoivent les notions de statistique indispen- 
sables en cette matiére. Les étudiants en psychologie appliquée (Institut 
de psychologie de |’ Université de Paris) recoivent également un enseigne- 
ment statistique de base et un enseignement spécialisé. 

Si, dans les secteurs ‘biologie’ et “agronomie’’, l’enseignement des 
méthodes statistiques se trouve organisé de facgon A peu prés cohérente, 
par contre, dans le secteur ‘“‘medical”’, il n’y a eu jusqu’ici que des tenta- 
tives fragmentaires (quelques séries de conférences) insuffisamment co- 
ordonnées: des progrés importants restent 4 faire dans ce domaine. 

Une des principales difficultés rencontrées dans |’enseignement des 
méthodes statistiques réside dans l’insuffisance de formation mathé- 
matique de la pluparte des éléves (leurs études antérieures ont été 
orientées surtout vers les sciences naturelles), et dans la nouveauté que 
présente pour eux le mode de raisonnement probabiliste. Cette situation 
pourrait étre améliorée si une initiation statistique et probabiliste était 
donnée au cours des études secondaires, et si certaines modifications 
étaient apportées au programme des connaissances exigées des candidats 
aux carriéres biologiques et médicales. 

Dans |’enseignement dispensé 4 des étudiants de formation mathé- 
matique parfois précaire, il est souvent préférable de renoncer au exposés 
rigoureux, et de faire appel 4 des raisonnements approchés ou intuitifs, 
sans toutefois masquer les difficultés; le strict domaine d’application des 
méthodes doit, d’autre part, étre bien précisé. 

Il a été reconnu que, sans sacrifier le notions théoriques essentielles, 
lenseignement doit rester aussi concret que possible. Les exercices 
pratiques sont indispensables et peuvent prendre plusieurs formes: étab- 
lissement ou vérification de lois statistiques et propriétés des échantillons, 
4 partir de tirages de boules, jeux de dés ou de cartes-petits problémes 
destinés a faciliter |’assimilation des parties théoriques de l’enseignement- 
applications numériques, avec exécution compléte des calculs et emploi 
de machines 4 calculer. 


a 
| 
| 
| 
| 


164 BIOMETRICS, MARCH 1954 


SIR RONALD FISHER. (University of Cambridge). The 
243 Variability in the Length of Germ Plasm Still Heterogeneous 
After a Given Amount of Inbreeding. 


Elementary inbreeding theory gives the expected proportion in which 
that part of the germ plasm initially heterogeneous will be reduced bya 
given procedure of inbreeding. The progress toward homogeneity is, 
however, affected by chance at each stage, with the consequence that the 
amount of heterogenic material varies greatly from qne individual to 
another, though obtained by the same procedure. 

More exact knowledge of the nature and extent of this variation is 
obtainable by two paths: 1. by the ¢aiculation of the numbers of junc- 
tions formed by recombination at different stages of the inbreeding 
process, and 2. by the calculation of the rate at which the germ plasm at 
two different loci ceases to be simultaneously heterogeneous. Using 


these two methods together the variance may be calculated, and the 
arts ascribed to variation in the lengths of heterogeneous tracts, and 
a their numbers can be distinguished. 


9 .. MATHER eDa en enetics, | rsitv of Bir- 
244 } Mat inl al 
mingham). The Methodology of Biometrica! Genetics. 

Biometrical genetics rests on the fusion of Mendelian genetics with 


the statistics of measurement. It has been demonstrated that the genes 
ot biometrical genetics are borne on the chromosomes so that the whole 
range of mechanical phenomena established by Mendelian geneties must 
upply. Indeed the assumption of these phenomena is the basis for all 
analysis in biometrical genetics. The range of phenomena established 
by Mendelian genetics for gene action may apply, but this must be 
established step by step in biometrical genetics. Further, since the 
statistical quantities and statistical techniques of biometrical genetics 
differ from those of Mendelian genetics, a given phenomenon, though 
traceable in both, may be recognised and measurable in different WAVs. 
We must not, therefore, try to press biometrical genetics too much into 
the form familiar from Mendelian genetics, nor must we try to accom- 
modate all Mendelian phenomena explicitly and completely in our bio- 
metrical analyses just because they exist ready made for us. Rather 
we should proceed gradually, using the results and concepts of Mendelian 
genetics as a guide, but allowing biometrical genetics to develop its own 
methodology and to build up its own conceptual structure (which we 
should seek to relate to the older genetical theory), just as Mendelian 
genetics proceeded in its earlier days. How far have we progressed? 


as 
| 
ib 
Tt 
qu 
4 
“a 
| 
| 
Pde |. 
¥ 


ABSTRACTS 165 


Mendelian genetics warns us to expect six great classes of variation 
in our biometrical work; the environmental, the additive heritable, the 
effects of dominance, the effects of genic interaction, the effects of extra- 
nuclear particles, and the effects of genotype-environment interaction. 
Extra-nuclear effects exist, but are seldom encountered in a way which 
requires their incorporation in the analysis. The environmental and the 
additive genetic variation can be handled with confidence. The effects 
of dominance can easily be incorporated in analysis but are often very 
troublesome to measure for technical reasons. New genetical designs of 
experiment must be explored, and some already hold promise of over- 
coming the difficulties of measuring the effects of dominance. Interac- 
tions of non-allelic genes and of genetic and environmental agencies 
have been established by experiment, but we are still in no position to 
measure and interpret them. To do so will require consideration of the 
definitition of interaction, which in biometrical genetics is a relative 
phenomenon, of the wider possibilities and uses of scaling tests and of the 
development of a theory of interaction, probably more in the direction 
of the statistical notion of interaction than along the familiar lines of 
Mendelian genetics. Finally we shall require a theory of variability 
interpretable in terms of Mendelian theory but differing in structure 
from it, and the recognition of effective units of inheritance whose rela- 
tion to the genes of Mendelian genetics will require close consideration. 


D. C. LOWRY. (University of California, Berkeley, Cal.) 
245 Variance Components with Reference to Genetic Population 
Parameters. 


One of the important problems both of applied and of theoretical 
genetics is the determination of the comparative effects of the various 
factors which affect the inheritance of quantitative characters. These 
characters may be controlled by the action of a large number of gene 
pairs, by environment and possibly by the interaction of genotype and 
environment. In analyzing their inheritance Fisher, Wright and others 
have found the analysis of variance a powerful tool in identifying these 
effects and in characterizing the multigenic system according to additive 
genetic effects, dominance deviations from the additive scheme and non 
allelic gene interaction. Their analyses have been based on models of 
Mendelian heredity which involve some restrictive assumptions. 

The detection and interpretation of components of variance in the 
study of quantitative traits involve statistical problems of two kinds: 
first, the construction of models, and of experimental designs based upon 


166 BIOMETRICS, MARCIL 1951 


these models, containing fewer restrictive assumptions and, second, the 
consideration of the purely statistical aspects of the funetions of eom- 
ponents of variance used as estimates of population parameters. 

This paper is intended as a review of what has been accomplished 
in these two phases of study, particularly the latter. 


246 J. L. LUSH. (Iowa State College, Ames, Iowa). Estimating 
Heritabilities. 


The logic of separating the differences between two individuals into 
a part caused by differences in their heredity and a part caused by 
differences in the environment to which they were exposed, is that of 
Taylor’s theorem. There may be a remainder or joint function if the 
effects of differences in heredity and of differences in environment do not 
combine additively. Choice of a suitable scale of measurement may 
help in minimizing this joint portion, although no simple and automatic 
rule for finding such a scale seems possible. 

All of the methods of estimating heritability compare the differences 
between individuals related in various ways. These comparisons can be 
expressed as correlations or regressions or ratios of variance components, 
according to convenience in computing or clarity in conveying the 
thought to the listener. 

The commoner methods concern isogenic lines: regressions on parent 
and mid-parent or selection experiments; and resemblances between half 
and full sibs. Because of large sampling errors, individuals less closely 
related than grandparent and grandchild or than half-brothers, offer 
little information on heritability. 

Interpretation is beset with pitfalls which are partly statistical but 
arise partly from incorrect appraisal of the biological or physical cir- 
cumstances. Some of the commoner pitfalls, besides sampling errors, 
are the following: 

The data may have been selected in a way for which perfect corree- 

tion is not made. 

The phenotypic variation may have been sharply discontinuous. 

The scale of measurement may have been far from linear with the 

effects of the genes or of environmental variations. 

The correlations between the environments to which the relatives 

were exposed may have contributed more to their phenotypic resem- 

blance than is supposed. 

Non-randomness in the mating systems which produced these indi- 

viduals may not have been discounted adequately. 


4 lds 
| 
qe 
| 
| 
| 
a 


ABSTRACTS ; 167 


Much of the variance may have been due to dominance or epistasis. 
Fither the hereditary variance or the environmental variance may 
be a little different in other populations to which the observer wishes 
to apply his conclusions. 


247 J. W. HOPKINS. (National Research Council of Canada). 
Some Needed Tests of Significance. 


Requirements for test of significance of the following are illustrated 
by experimental data. (i) Inconstancy of the negative binomial param- 
eter k characterizing each of mn samples of 2; needed to validate analyses 
of blood counts before and after imposition of m treatments on groups 
of n animals, assuming that random manipulative and secular discrepan- 
cies result in successive counts on the same individual being negatively 
binomially distributed with a /: common to all mn individuals. (ii) In- 
constancy of the analogous hypergeometric variance parameter in m 
samples; needed to validate simple criteria for 2-stage acceptance samp- 
ling by attributes of large consignments of packaged items (e.g. boxed 
fruit) when defectives are contagiously distributed between packages. 
(iii) Departures from goodness of fit of linear regression formulae when 
both variables are subject to error; needed to demonstrate inconsistencies 
in measurements by two procedures, e.g. standard and accelerated 
methods for moisture in grain. (iv) Inequality of means of m binomial 
variates of Poisson; needed to demonstrate differences in mean acuity of 
m groups of subjects in a taste experiment. 


M. KEULS. (Institute of Horticultural Plant Breeding, Wa- 
248 eningen). Testing Differences Between Means in an Analysis 
of Variance. 


The problem: in applying an analysis of variance, after having con- 
cluded from an F-test at a significance level 0.05 that the null-hypothesis 
wi = bw, = uw, has to be rejected, so that at least some of the u; are 
different, most research workers feel the lack of a convenient statistical 
procedure stating what differences should be considered real. In general 
we may be interested in the question what contrasts within a previously 
chosen subset of the set of all contrasts : a a.u.(>, a; = 0) should be 
considered different from null. 

A procedure in general use is the least significant difference test 
(I.s.d.-test), consisting of applying an ordinary t-test to each difference 
or contrast separately, then and only then, if the F-test rejects the null- 


| 2 
| 
4 
| : 
| 
| 


= 


| 


168 BIOMETRICS, MARCH. 1954 


hypothesis. This procedure has been denounced by most writers to-day 
as it highly exaggerates the significance of the conclusions. 

Several alternative procedures which fall into two classes have been 
suggested. The oldest are the multilayer significance tests (Newman, 
Dunean, Tukey. KNeuls). Although these tests implicate some nominal 
significance level, the real significance levels are problematic. The other 
class contains procedures of multiple confidence statements (Tukey, 
Scheffe, Rov, Bose and Roy and others). Here nominal and real signifi- 
cance levels are identical. The procedures are simple and will be clear 
also to non-statistically trained readers of the records. 

In discussing old and new procedures, the following points are of 
interest: 


1. There may be defined different significance levels or according to 
Prof. Tukey “error rates’’. 


te 


There may be a choice between a significance level procedure and a 

procedure of confidence statements. 

3. A point that has been insufficiently stressed by writers on confidence 
procedures, is to indicate, before choosing a procedure, what linear 
combinations Zz, a,u, of the parameters u, we want to make a confi- 
dence statement about. 

$+. Prof. Tukey gives some serious warnings against the multilayer 

significance procedures proposed until now. He defines a standard 

significance level as the maximum chance of rejecting some chosen null 
hypothesis of the form (u's arranged according to numerical value): 

My = Mo = < Mea = = S = 


at a nominal significance level of say 0.05. Even in the more conserva- 
tive multilayer procedures this chance exceeds the nominal value 0.05. 
As en alternative Tukey himself, though more inclined to use the confi- 
dence procedures, proposes for multilayer procedures a constant standard 
level basis. In that ease the standard significance level, say 0.05, should 
be chosen as the nominal significance level. The resulting significance 
procedure will never exaggerate the reliability of the results. 


249 M. J. HEALY. (Rothamsted Experimental Station). Decision 
Between Two Alternatives; How Many Experiments? 


When in agriculture or technology a new treatment or process is 
suggested, the difference in output per unit 7 between old and new pro- 
cesses can be estimated by the mean j of the results of 1 experiments. 


A 
4 
| 
| 
| 
| 
| 
| 
‘pa 
| 
a 
vf 
le Pd 


ABSTRACTS 169 
The net gain associated with the new treatment may be taken to be a 
linear function of 7, say 


n= k(n — 0), 


where hk’ is a positive factor depending on the scale of application, and 
cisa constant which depends on the difference in cost of the old and new 
treatments and the capital cost of the change. The new treatment will 
be adopted if 7 — ¢ is positive. We wish to arrive at the number of 
experiments which is economicaliy justifiable allowing for the losses due 
to possible wrong decisions which in the long run will be added to the 
cost of experimentation. If, therefore, the cost per unit of experimenta- 
tion is k, and P denotes the probability of getting 7 — ¢ > 0, we wish 
to minimize the difference 


R=kn — Pp 


kn — k’P(n — ¢). 


This expression is the risk function, measured from the status Guo. 

The present note deals with the case, analogous to the double samp- 
ling procedure of Dodge and Romig, in which n has to be decided after 
a single experiment (or unit of experimentation) has been carried out. 
For simplicity we assume that the errors in all experiments are inde- 
pendent normal deviates with known standard deviation o. ‘The prob- 
ability P is now calculated allowing for the known result y, of the first 
experiment. Moreover, the risk depends on the unknown 7, which must 
be eliminated; this has been done by averaging R over the fiducial dis- 
tribution of 7 (a normal distribution with mean y, and standard devia- 
tiong). Minimisation of the resulting function R provides an intuitively 
reasonable determination of n, and it has been shown that no other rule 
can have a uniformly better performance. One notable result brought 
out by the theory is that it is seldom economic to do a very small amount 
of additional experimentation. The reason for this is the disproportion- 
ate smallness of the chance of altering the decision indicated by the 
preliminary experimentation. When k, k’, ¢ and y, — ¢ are given, the 
ratios (k’o ‘ke and (y, — e),c) suffice to determine n, and a nomogram for 
doing this rapidly has been prepared. 

The performance of the suggested rule has been evaluated for a 
series of parameter values. A comparison has been made with the results 
of an alternative rule, in which » is chosen so as to maximize the gain 
per unit outlay. Some work has also been done on an analogous sequen- 
tial sampling scheme. 


- 
| 
| 
| 
| 
| 
| 
| 
] 
| 


j 


170 BIOMETRICS, MARCIE 1954 


C. RADHAKRISHNA RAO. (Indian Statistical Institute— 

250 Calcutta). A General Theory of Discrimination When the In- 
formation About Alternative Population Distributions is Based 
on Samples. 


The problem may be stated as follows: 


Two samples of sizes n, and n, are available from two populations 
P,(x | 6,) and P,(x | 6) where x stands for all the measurements and 6 
for all the parameters. An individual with given measurements y has 
to be assigned as a member of one of the two groups basing the decision 
on the observed values only, the parameters occurring in the alternative 
distributions being unknown. In this paper an attempt is being made to 
lay down a decision rule independent of the unknown parameters. 

If the measurements are pin number, we havea total of (n, + n. + 1)P 
observations which can be represented by a point in a Euclidean space. 
The decision rule requires the division of the space into two regions R, 
and FR, such that when the point of observation falls in R, the individual 
is assigned to the first group and otherwise to the second. Whatever 
may be the set of regions, it should have the property that errors of 
classification when the alternative populations are different must be 
smaller than those when the populations are the same. This criterion 
leads to the restriction that the size of each region should be the same 
whenever the two probability densities ?, and P, are identical irrespec- 
tive of what the actual values of the common parameters are. 

We have now to fix the size of the regions R, and R, when P, and P, 
are identical. When the population distributions are identical, the 
decision may be equivalent to that of tossing an unbiassed coin so that 
it is reasonable to take each size as 50 percent. The special case of fixing 
the size at the 5% level leads to a test of the null hypothesis that the 
individual belongs to the first group at level 5%, the alternative being 
the second group. 

The problem is now to determine such similar divisions 2, ?, covering 
the entire space which have fixed values when the two distributions are 
identical and for which the errors of classification is a minimum. ‘The 
problem of minimising the errors reduces to dividing the region common 
to the surfaces of sufficient statistics as in the general problem of testing 
composite hypothesis. 

Again, in all cases no uniformly best division is possible on the sur- 
faces of sufficient statistics. We may then determine regions for which 
the errors of classification is least locally, i.e. for small departures from 
the equality of populations. 


4 
| 
| 
| 
| 
| 
ad 


ABSTRACTS 171 


The theory is general and can be applied even when the alternative 
distributions are more than two. 


L. MARTIN. (Faculté de Médecine,— Université de Bruxelles). 
251 Suggestions pour la Collection et l’Analyse de Données Longi- 
tudinales en Gerontologie. 


Supposons qu’un médecin trouve que le cholestérol est normal (1.8 
gr/l.) chez un patient A et trop élevé (2.8 gr/l.) chez patient B qui 
manifeste des signes cliniques d’artérioclérose. Ces deux patients sont 
de méme Age, 60 ans par exemple. On peut supposer qu’a 40 ans le 
cholestérol sanguin était normal dans les 2 cas. Si oui, & partir de quel 
Age le patient B est-il devenu “hors contréle’? — Si l’on peut déterminer 
cet instant, la méthode de la médecine préventive aura une bonne chance 
de réussir, car les lésions au sens large du mot sont peut-étre encore 
réversibles. On propose de déterminer les limites fiduciaires pour la 
moyenne et une observation isolée 4 une Age donné en ajustant des poly- 
nomes orthogonaux individuels 4 des données recueillies sur un échan- 
tillon de patients suivis d’année en année dés l’Age de 40 ans. L’impor- 
tance de cet échantillon ainsi que son mode de prélévement dépendront 
des facilités techniques. Une telle méthode a été discutée avec le Prof. 
W. G. Cochran et appliquée dans le cas du développement de l’activité 
histaminolytique chez 18 femmes suivies pendant les 9 mois de la 
grossesse. Référence est faite A une suggestion de Sjégen et une autre de 
Tanner. Ce dernier étudie dans le cas de la croissance des enfants, les 
mérites respectifs de l'information, tirée de données tranversales (A 
temps fixe) ou longitudinales (méme individu suivi 4 des moments 
successifs). 

L’auteur fait une suggestion de collection de données a une échelle 
collaborative intra-et inter pays civilisés. 


252 R. PRIGGE. (Paul-Ehrlich Institut, Frankfurt a. M.). Die 
Anwendung der ‘‘Mutungsbereiche”’ in der Immunitatsforschung. 


im I. und IT. Abschnitt wird die Frage behandelt, welche Rueck- 
schlusse aus den an einer Stichprobe gewonnenen Ergebnissen eines Im- 
munisicrungsversuches auf die Zusammensetzung der Gesamt population 
gezogen werden durfen, der die Stichprove enthommen ist. Die ver- 
schiedenen Verfahren werden besprochen, um den “.MWudungsbercieh” zu 
ermitteln, innerhalb dessen der unbekannte wahre Anteil der immuni- 
sierten ‘Tiere erwartet werden darf. Es werden Angaben uber die Wahr- 


|. 
| 
i 


172 BIOMETRICS, MARCH 1954 
scheinlichkeit der so gewonnenen Aussagen gemacht. Im III. Abschnitt 
wird ein Naherungsverfahren erértert, dessen Resultate mit den nach den 
exakten Methoden gewonnenen Ergebnissen gut tibereinstimmen und 
das die Rechenarbeit erheblich reduziert. 

Der IV. Abschnitt diskutiert die treffertheoretische und die varia- 
tionsstatistische Deutung von Zmmunisierungskurven und zeigt die Mo- 
glichkeit, mit Hilfe der Mutungsbereiche zu einer Entscheidung zwischen 
beiden Erklarungsversuchen zu kommen. 

Im V. Abschnitt wird die Verwendung der Mutungsbereiche zur 
Ermittlung des Wirksamkeitsverhdlinisses zweier Impfstoffe behandelt. 
Im VI. Abschnitt wird die Anwendbarkeit der Naherungslosung auf das 
Gebiet der selienen Ercignisse nachgewiesen und ihre Verwendung zur 
Bestimmung des Keimgchaltes von Fliissigkeiten besprochen. 


J. IPSEN. (Institute of Laboratories, Boston, Mass.). Factors 
253 of Dosage and Host, Determining Antibody Response to Second- 
ary Antigen Stimulus. 


A secondary stimulus is an injection of antigen in individuals who 
have previously been exposed to the same antigen. 

The antibody concentration in the serum after a secondary stimulus 
is dependent on 


(1) The immunity status when the secondary stimulus is given. 
(2) The antigenic potency of the secondary stimulus. 
(3) A negative interaction between the factors of (1) and (2). 


The immunity status is in practice estimated in two ways. One is 
the antibody concentration which can only be used as a comparable 
function of the immunity status of different individuals, if comparable 
time intervals have elapsed since primary exposure has occurred and if 
the antibody concentration is above the measurable level in the majority 
of the individuals. The second estimate can be obtained if the potency 
of the primary dose is known and the individuals have comparable 
immunizability. Immunizability is an inherent host characteristic which 
determines the response to primary stimulus. It can be measured by 
the dose of antigen which is necessary to confer a given primary immune 
status. 

Immunization of 128 inmates of a sehoo!l for mentally retarded 
individuals was performed with two injections of tetanus toxoid 28 days 
apart. Hight different doses were given in each injection according to a 
latin square design for the first. and second injections. Antibody titers 


i 
ea 
j 
| 
| 
| 
| 
| 
|| 
: | 
| 
\s 


The antitoxin titers 14 days after the second stimulus was fitted to the 


ABSTRACTS 173 
were only measurable in 50 individuals prior to the second injection. 


following expectancy formula: 
Y =atha, + bore — 


Where Y is log antitoxin titer and x, and x, is the log potency of the first 
and second dose, respectively. 

A satisfactory fit could only be obtained if the individuals were 
divided into three groups according to certain somatic criteria, and a 
parameter c was introduced being constant for each group 


Y=atb,(x, + ¢) + br, — + ©) 


The variable ¢ is interpreted as the primary immunizability, inherent 
with the somatic characteristic of the individual. 


254 L. B. HOLT. (Wright Fleming Institute, London). Quantita-. 
tive Studies in Diphtheria Prophylaxis. 


An attempt has been made to characterise the antigenicity of any 
diphtheria prophylactic in mathematical terms (responses to a single 
inoculation). 

Use is made of the observation that the responses among a group of 
similar subjects, identically treated, is log normally distributed; as well 
as the observation that when the results of a dose-response experiment 
are plotted as probit y©% against log dose administered a straight line is 
obtained—the probit regression line. 

It was found that three variables are involved, namely d. the dose 
required to produce some arbitrary reference point of response, b., the 
slope of the probit regression line, which gives information in respect of 
the percentage increase of subjects attaining or exceeding some arbitrary 
level of response with increase of dose, and o,,, , the standard deviation 
of logs of titres. These three variables are incorporated into one general 
equation: 


Log = low GAM Crees »-(Iog 


The product of o;,,.. and b. for any one set of data is a constant (A.) 
which is numerically equal to the slope of the dose-response curve 
log GoM. log dose. 

The value of A. for the diphtheria prophylactic P.T.A.P. measured 


| 
| 
| 
| 
| 


174 BIOMETRICS, MARCIL 1954 


in children was found to be approximately 0.86 and in guinea pigs about 
0.6; and the o,,,. of titres 0.65 for children and 0.51 for guinea pigs. 

Evidence is offered to show a marked dissimilarity in the constants 
for other prophylacties measured in children and in guinea pigs; some- 
times the guinea pig will underrate a prophylactic in terms of children, 
and overrate it for another kind of prophylactic. 

The importance of the laboratory use of appropriate “Standard 
Antigens” that have been calibrated in the field is stressed. 


F. YATES. (Rothamsted Experimental Station). The Place of 
255 Simple Experiments on Cultivators’ Fields in Agricultural Devel- 
opment. 


In countries such as India the level of fertility and other conditions 
on experimental farms tend to be very different from those on cultivators’ 
fields. It is unwise, therefore, to assume that the responses to fertilizers 
and other agricultural treatments will be the same on experimental farms 
and cultivators’ fields. Consequently, there is a need to arrange for 
experimental tests on cultivators’ fields. 

The most suitable type of test for this work depends on local condi- 


tions. In countri ich as India very simple tests involving three treat- 
ments only have | on recommended. The questions that can be investi- 
gated by such tes. . ve very limited, but with tests containing five or six 


treatments more coinplicated questoins can be investigated. Devices 
that can be used under such circumstances are: 


Careful selection of treatments. 
Division of the questionnaire into parts. 
3. Confounding with sites. 

4. Incomplete block designs. 


1. 
2. 


Examples are given of the application of these devices to Indian 
problems. 


V.G. PANSE. (Indian Council of Agricultural Research, New 
256 Delhi, India). Principles of the Survey Method of Experimenta- 
tion. 


The urgent need of increasing agricultural production by passing on 
to farmers the results of agricultural research has led to a rapidly growing 
emphasis in India during the last few years of what may be termed the 
survey method of experimentation, as against the classical type of experi- 
ments at agricultural experimen’ stations. The results from the latter, 


| 
| 
( 
| ( 
= 
1 
I 
] 
I 
{ 
( 
‘ 
‘ 
3 zz 
| 
|, 
| 
| | 
‘| 
| 
|> 


ABSTRACTS 175 


valuable as they are, cannot be recommended directly for large scale 
use under actual farming conditions, because the number of experiment 
stations is small and cannot be regarded as fully representative of the 
tract served by them. Experiments on a representative sample of the 
cultivated area therefore become necessary, before recommending a 
technique to farmer. Such a sample can be secured only by selecting 
field for experiments randomly out of fields on which a particular crop is 
grown in the tract. 

The aim of the experiments is generally to estimate with a reasonable 
degree of precision the average response to treatments over the tract, 
detect any interaction of these responses with variation among agricul- 
turally homogeneous sub-division into which tract may be divided and 
estimate the responses in the invididual sub-divisions. The precision of 
the estimates is based on the random variation among fields selected for 
experiments or rather on the interaction of treatment responses with this 
variation. Consequently, the importance of the experimental error, as 
calculated in the classical replicated experiment recedes to the back- 
ground and replication in the same field, which is essential for estimating 
the latter, may be sacrificed altogether in order to secure more informa- 
tion on the variation among fields. Replication in the sense of the num- 
ber of repetitions of the experiment in different fields is, of course, im- 
portant, but randomization of treatments in each field has not the same 
role in securing the ubiased comparison between the treatments as in 
the classical experiment. It is, however, safe to adhere to it in order to 
avoid possible biases arising from border and competition effects peculiar 
to any fixed arfangement of experimnetal plots in a field. The third 
principle of replicated experiments, namely, local control involving a 
compact arrangement of plots and attention to size and shape of plots 
and blocks is also of little importance, since the variation between fields 
is far greater than variation between plots in a field which local control is 
intended to minimise. This allows the latitude needed for fitting the 
experiment in the farmer’s schedule of operations with the least possible 
disturbance of the latter, which is an essential organisational considera- 
tion. 

Two sets of projects on the survey type of experiment have been 
undertaken in India recently. In one scheme following the recommenda- 
tions made by Dr. A. B. Stewart in his report on Soil Fertility Investi- 
gations in India, two or three treatments with the local practice as 
control are tried in each field selected randomly in the tract. The treat- 
ments are those which State Departments of Agriculture consider as 
likely to increase the yield on the basis of the past results at experiment 
stations. A more ambitious programme of experiments has been com- 


| 

| 


— 


| 
j 


176 BIOMETRICS, MARCIT 1954 


menced this season in several areas under the fertilizers research project, 
sponsored jointly by the Government of India and the TLC.A., with the 
object of acquiring information on the response of certain important food 
crops to different nitrogenoeus and phosphatic fertilizers. 

A somewhat different type of experiment is represented by the 
assessment surveys, conducted by the Indian Council of Agricultural 
Research in different States in order to estimate the additional yield 
resulting from various Grow More Food schemes, such as distribution of 
improved seed and fertilizers, provision of new sources of irrigation, 
tractor plowing weed infested land, ete. Comparison is made between 
the yield in the area which has received one or more of these aids and an 
appropriate control. The experiment is not a strictly controlled one in 
that the treated areas are not selected randonly vis-A-vis the control and 
the comparison may therefore be open to bias. These surveys form an 
example of operational research in agriculture and have yielded much 
valuable data. 


T. N. HOBLYN and 8. C. PEARCE. (Fast Malling Research 
257 Station, Kent). Some Considerations in the Design of Succes- 
sive Experiments on Fruit Plantations. 


Some points needing consideration in designing trials with long-lived 
plants, especially trees, are: 


1— The initial trial. 


Reliance has usually to be placed on a single trial and consequently 
it is necessary for the questions under investigation to be posed clearly 
from the beginning. 

Trees are large and there is often only one toa plot. The performance 
of a single tree is not determined solely by positional effects but is a 
complex of its history and environment; consequently adjustment by 
covariance on to past performance is often an advantage. 


2— Subsequent trials. 


Plants often outlast the experiment for which they were initially 
intended and it is then necessary to provide for the application of further 
treatments when the first set are no longer of interest. 

Where there is little likelihood of the new and original treatments 
interacting, the new treatments may be applied orthogonally to the 


4 
4 
‘| 
Vet 
| 
j 
| 
} 
| 
| 
| 
4 | 
| 
| 
if 


ABSTRACTS 177 


blocks or original treatments, or they may be balanced either totally or 
partially, or they may be supplemented. In this last device one treat- 
ment, usually the control, occurs a different number of times from the 
rest, which are balanced among themselves. From a consideration of 
available useful design, it is concluded that trials in which a further set 
of treatments is likely to be called for are best designed, if in randomized 
blocks, with the number of blocks and of original treatments either equal 
or differing only by one. In the latter case repeated changes become 
possible as the residual effects of former treatments disappear. 


D. J. FINNEY. (University of Oxford). Functional Relat on- 
258 ships in Experimentation (Their Role in the Design and Analysis 
of Experiments). 


Agricultural and biological experiments may be divided into three 
categories in respect of the extent to which functional relationships are 
important: 


1)—Experiments in which such relationships are irrelevant or even 
non-existent. 

2)— Experiments for which an underlying structure of functional rela- 
tionships between the effects of different treatments is apparent, but 
only certain characteristics are directly of interest and other details 
of the relationship matter little. 

3)—Experiments whose primary object is the study of a funetional 
relationship. 


This classification is an operational convenience rather than a clear- 
cut separation. 

Examples of experiments falling into these categories will be pre- 
sented. Category 1) clearly contains little of interest for the present 
purpose and 3) occurs relatively infrequently. Evidence will be pre- 
sented that, for category 2), in a well-designed experiment characteristic 
of the functional relationship other than those under study often do not 
materially affect the validity of estimation; for example, the position and 
magnitude of a maximum on a curve may be estimated fairly satisfac- 
torily without knowledge of the precise form of the curve, provided that 
observations have been made on points well-distributed about the maxi- 
mum. Nevertheless, careful utilization of all existing information on a 
functional relationship at the time of planning a new experiment may 
help greatly in the obtaining of high precision for new estimates. 


. : 
i 
3 
a! 
4 
» 
L 
r 
Ss 


178 BIOMETRICS, MARCIT 1954 


J. BERKSON, (Division of Biometry and Medical Statistics, 
Mayo Clinic, Rochester, Minnesota, U.S.A.). Maximum Like- 

259 lihood and Minimum Chi-Square Esitmates of Regression Co- 
efficients. 


Several functions will be briefly discussed, but chiefly attention will 
be given to the logistic function, P = 1/1 — e~‘***”, with the observa- 
tion on P assumed to be a random binomially distributed variable, as in 
the model for bioassay with quantal response. Three estimators are 
investigated: (1) Maximum likelihood, (2) Minimum x’ (Pearson), and 
(3) Minimum logit x° (Berkson, J., J. Am. Statist. A., 39 [1944] 357). 

Two cases are considered, (1) 8 known, a to be estimated, and (2) a 
and 8 both to be estimated. In the example dealt with, there are three 
equally spaced values of x (‘‘doses” in bioassay) with V = 10 at each. 
The dose arrangements are for P = 0.3, 0.5, 0.7, corresponding respec- 
tively to the three consecutive doses, and for other sets of three doses 
each, in which the value of P corresponding to the central dose is 0.6, 
0.7, 0.8, 0.85. 

In the case with 8 known, a@ to be estimated, the results are based on 
calculations of the total sampling population; in the case with both a 
and 8 to be estimated, they are based on a sample of 2,000 with each 
dose arrangement. 

For central dosage P = 0.5 each of the three estimators is unbiased; 
for other dosage arrangements each is biased, the maximum likelihood 
estimate positively, each of the x” estimates negatively. The mean 
square error and variance about the mean are largest for the maximum 
likelihood estimate, smaller for the minimum Pearson x’ estimate, and 
smallest for the minimum logit x° estimate. 

The estimates are considered in relation to the Cramer-Rao lower 
bound for the mean square error. The bound value itself is highest for 
the maximum likelihood estimate, lower for the minimum x’ estimate, 
and lowest for the minimum logit x’ estimate, but the m.s.e.-of all three 
estimates is higher than their respective lower bound value. 

Each of the estimators is sufficient. Blackwell’s theorem (Ann. 
Math. Statis. 18 [1947] 105), may therefore appropriately be applied. 
The ‘“‘Blackwellized” value of the estimate (conditional expectation of 
estimate for fixed value of sufficient statistic) is the same for the maximum 
likelihood estimate as before Blackwellization, but with the minimum 
logit x’ estimate, the mean square error is diminished by Blackwellization 
to its lower bound value. 


q 
| 
| 
| 
le 
40 
| 
|. 
| 
di 
|, 
| 
| 
| 
| 
od 
4 
iy 


ABSTRACTS 179 


MAURICE FRECHET. Réhabilitation de la notion statistique 
260 
de l’homme moyen. 


Quetelet, le grand statistician belge, avait proposé une définition 
statistique de homme moyen qui était trés naturelle: homme moyen 
d’une population était un homme dont chaque caractéristique (poids, 
taille, etc. ---) serait la moyenne de cette caractéristique dans la popu- 
lation. 

Mais bientét Cournot, puis Joseph Bertrand montraient par des 
exemples géometriques que cette définition pourrait conduire 4 des 
contradictions. Il n’était pas certain qu’un homme pourvu de ces 
caractéristiques moyennes put subsister ou méme exister. 

J’ai cherché 4 donner une définition assez raisonnable en soi et 
échappant a cette objection. J’y ai réussi en m’inspirant de ma théorie 
des éléments aléatoires de nature quelconque’. 

L’idée consiste 4 chiffrer l’écart global (A, P) de l’ensemble des 
caractéristiques pour un individu A de la population P considérée et 
pour P. L’homme moyen de P sera évidemment celui pour lequel 
cet écart sera le plus petit. 

Il pourra y avoir plusieurs hommes moyens de P,—comme il peut 
y avoir plusieurs médianes d’un ensemble de nombres—Mais, en tout 
cas, un tel homme moyen, étant un individu de la population P existe 
nécessairement; de sorte qu’on échappe & l’objection de Cournot et de 
Bertrand. 

{] reste & indiquer comment caleuler l’écart (A, P). 

' Nous commencons d’abord par nous éearter de l’hypothése que la 
moyenne est la seule valeur représentative d’un ensemble ‘de nombres. 
it nous ferons intervenir une valeur typique ‘“d’un tel ensemble’’, qui 
pourra étre sa moyenne, une médiane, une dominante (appelée facheuse- 
ment ‘mode’) ete. 

Nous considérons alors pour deux individus A, B de P leurs tailles 
a, b; leurs poids a’, b’, etc. --- L’écart (A, B) de A et de B sera, par con- 
vention, une valeur typique de l’ensemble des écarts | a — b |, | a’ — b’ |, 
etc.” Et l’écart de A avec P sera une valeur typique de l’ensemble 


iCe qui suit résume une brochure de 24 pages imprimée sous le méme titre dans la ‘‘Collection 
conférences faites au Palais de la Découverte” 1949, Palais de la Découverte, Avenue Franklin 
Roosevelt, Paris. 


de 


2Voir le Chapitre VI de la Dewriéme Edition de mon ouvrage ‘‘Recherches modernes sur la Théorie 


des probabilités, Premier Livre’. 


‘Pour que ces différences soient de méme nature et indépendantes des échelles, on peut prendre 
pour a, a’,—les quotients des mesure directes par leurs dispersions respectives dans 7, ou bien les 
rangs de A dans les valeurs des poids, des tailles, ete. rangées respectivement par ordre de grandeur. 

On pourra aussi tenir compte de la dépendance et de l'importance des diverses caractéristiques 
eu leur attribuant des ‘‘poids’”’ convenablement choisis. 


a ic 


180 BIOMETRICS, MARCIE 1954 


des éearts (1, 1) = 0, C1, BY, C1, O de A avee chaque individu 
de I’. 

Une variante de cette définition a laquelle Mme Defrise (de Brux- 
selles) et moi—méme sont arrivés indépendamment consiste 4 prendre 
pour (.1, P?) ’écart (1, p) de A avee l’individu fietif p dont chaque carac- 
téristique serait une valeur typique de l’ensemble des valeurs de cette 
méme caractéristique dans la population P. 

Les mémes deux définitions s’étendent 4 l’individu typique d’une 
population quelconque: animale, végétale, inanimée, ete. 

Nous avons signalé dans Biometrics des applications de ces notions 
& la définition de la race. 


2 G. DARMOIS. Sur la determination de l’axe d’un nuage 
rectiligne de points. 


M. G. Teissier m’a autorisé 4 vous présenter un résumé d’un travail 
auquel je donne le titre précédent. 

En 1948, devant la premiére Conférence Internationale de Bio- 
métrie et dans un mémoire publié par “Biometrics”, G. Teissier a 
proposé un procédé d’ajustement (minimum d’une somme (d’aires de 
triangles), qui conduit 4 la droite d’équation: 


La pente de cette droite est moyenne géométrique des pentes des 
droites de régression. 

Une généralisation de l’équation 4 n variables est immédiate et conduit 
a ce qui a été appelé “line of organic correlation” étudiée par Kermack 
et Haldane (n = 2) et par Kruskal (Biometrics 1953) pour n queleonque. 

G. Teissier revient sur cette question et se propose de comparer cette 
droite (D) a d’autres droites qui visent, par divers ajustements, 4 
fournir un résumé linéaire débarrassé des perturbations figurant dans 
les variables observées. 

L’exemple qu’il donne d’un tel probleme est celui de n mesures faites 
sur un échantillon de crustacés adultes, vérifiant convenablement les 
relations linéaires d’allométrie entre les logarithmes. On_ travaille 
alors sur un nuage de l’espace 4 n dimensions, trés allongé, sensiblement 
rectiligne, et dont on cherche ce qu’on appellera l’axe, en se servant 
des moments des deux premiers ordres, 

Droite ajustée par centres de gravité G. Teissier évoque la technique 
conseillée par Wald, puis par Bartlett, et considére que cette droite, 
sous certaines conditions, serait la méme que (D). 


4 
be 
| 
e 
| 
|i 
| 
\ 
| 
| 
| 


ABSTRACTS 181 


Personnellement, je crois que non, et que cette droite, suivant le 
découpage des observations, donnerait & peu prés des droites de régres- 
sion. 

Droite des moindres carrés généralisés Suivant la métrique adoptée, 
on trouvera le grand axe de |’ellipsoide des dispersions, et généralement 
la premitre composante principale de Hotelling. Cette droite (D’) 
vérifie une condition d’optimum qui extrait la plus grande fraction de 
la variance totale, mais restitue mal les coefficients de corrélation. 
En outre, elle souffre du défaut d’étre assez arbitraire. 

La droite du facteur général Si t est le facteur général, r,, le coefficient 
de corrélation de ¢ avec x, , G. Teissier propose la droite (D’’): 


T1101 TinOn 


L’estimation des r,, a besoin d’hypothéses du type Spearman. G. 
Teissier remarque que pour 2 variables, on ne peut résoudre le probléme 
et qu’a partir de n = 3, on trouve que, par exemple, la relation entre 
x,x, dépend de la 3éme variable. 


Conclusions 


(D) ne convient que dans des cas trés spéciaux. 
(D’) et (D’’) peuvent toujours étre utilisées et (D’’) semble préférable, 
et convient parfaitement s’il y a un seul facteur général. 


% A. F. PARKER-RHODES, M.A., Ph.D. Estimating Popula- 
tions of Irregularly Observable Organisms. 


The problem treated is that of estimating the total population of an 
organism in a given area when we cannot assume that all members of the 
population are ever observable simultaneously. There are many exam- 
ples of such cases; my own work concerns the higher Basidiomycetes, 
whose mycelia do not fructify every year and cannot be identified unless 
they do so. 

It is assumed that a permanently and constant population exists in 
the given area; cases where the population changes appreciably during 
the period of observation, and where the lifetime of an individual is 
comparable to the phenological periodicity, are excluded. 

The principle of estimation is this. If we have a number V of con- 
secutively numbered counters, known to be less than N but otherwise 
indeterminate, and sample 1 of them ¢ times with replacement, the 


& 
= 
é 


182 BIOMETRICS, MARCH. 1954 


highest ordinal borne by any of the counters taken, #, will have a 

\ a determinate probability distribution depending on ¢, NV, and N. Thus, 
| given é, 7, and N, we can estimate NV and obtain fiducial limits. The 
formulae are: 


1 


1 
= af 19 


i y If the frequency distribution of the fraction of the population observ- 
3 able over all the occasions when observations are made should be rectan- 
gular, the problem of estimating the total population is formally identical 
with the above, provided we know the upper limit N. But if the actual 
frequency distribution is f(x), it can be shown that n(x) = f f(x) drisa 
rectangular variate over the same field, provided that Lt,_... n(x) is finite; 
if this condition fails there is no rectangular variate. 

In general the form of f cannot be foreseen; in the case of my data on 
basidiomycetes it can be shown to be at least approximately true that 


f(x) = 

7 In general, one of the limitations of the method will be that there 
may be no way of discovering f. On integration, this value of f gives 
an incomplete gamma function: 


n(x) = a (log |; Lt n(x) = a! 


Of the two parameters, 8 can be expressed in terms of @ and the 

(duly weighted) mean of the observed numbers of fructifications ¢. The 
parameter a requires great labour for its rigorous estimation, but for the 
purposes of this method, which is necessarily rather crude, graphical 
; methods are sufficient. 
- For each series of observations to be analysed, we first estimate a, 
4 : and thence compute a series of n values corresponding to the raw 2’s. 
: From the highest of these we can calculate by the formulae given figures 
: for F — [N], E[N], and F + [N]. We can then employ the inverse func- 
fi tion to n in order to estimate from these the expectation and fiducial 
limits of the unknown population X. 

These estimates are of course not strictly correct, since in general 


—1 
1 | 
| 
| 


ABSTRACTS 183 


I(E(N]) # E[S(N)]; but the error involved is relatively trivial. The 
expectation distribution is extremely skew, so that upper fiducial limits 
are often meaningless and represent impossibly large populations. But 
the estimates of population which the method gives can be generally 
relied upon to within 50% either way. This may not sound very good, 
but most of the error is inherent in the nature of the problem, and more 
rigorous methods would not usefully reduce it. 

Particular examples of the kind of fiducial limits one gets may be 
cited from my work on Skokholm Island, Wales, which I estimate to 
have the following total populations of certain basidiomycetes: 


Lower Upper 
Species Fiducial Estimate Fiducial 

Limit, 5% Limit, 5% 
Naucoria nana . 5,000 7,000 50 ,000 
Panaeolus papilionaceus 1,600: 2,000 10,000 
Clitocybe fragrans 500 1,400 17,000 
Panaeolus campanulatus 150 250 900 


(these fiducial limits include also errors of sampling in the raw data). 


263 D. W. GOODALL. (University College of the Gold Coast, 
Achimota, Gold Coast). Factor Analysis in Plant Sociology. 


The methods of factor analysis developed first for the purposes of 
psychology, and subsequently used in a wide variety of fields, are also 
suitable for studying the joint distribution in the field of different species 
of plants, some measure of the quantity of each species of plant present 
in sample areas providing one variable for the analysis. The results of a 
set of observations on an area of desert scrub in south-eastern Australia 
are analysed for purposes of illustration by methods based on those of 
Hotelling, data for fourteen species being used. Some difficulties likely 
to arise in the application of factor analysis to problems of plant sociology 
are discussed. 


% ALR. G. OWEN. (Department of Genetics, Cambridge). Ex- 
perimental Design in Genetics. 


Analytical genetics is a suitable term to describe that kind of 
genetical research where, by controlled breeding of organisms, we follow 


| 

] 

{ 
| 

| 

L 


184 BIOMETRICS, MARCH 1954 


the assortment and recombination of sets of discrete Mendelian factors. 
The analysis and interpretation of the data which arise involve a coherent 
system of statistical methods, exhibiting points both of analogy and 
contrast with those so well-known in the realm of agricultural experi- 
mentation. 

In analytical genetics we are concerned fundamentally with the 
estimation of certain pure numbers such as segregation ratios or recom- 
bination fractions; i.e. those parameters which enable the breeding 
behaviour of organisms to be predicted. In agriculture we tend to stress 
the detection of differences between stocks of treatments. This however 
is clearly a question of estimation and the difference is only one of 
emphasis. 

In agricultural statistics almost everything involves normally dis- 
tributed variates, and the prime tool is the analysis of variance, with the 
principle of orthogonality as guiding consideration allowing individual 
effects to be distinguished. In genetics the variates are always whole 
numbers being multinomial class frequencies. Taking estimation as the 
basic problem, complete coherence of method is achieved by proceeding 
from the maximum likelihood theory, using the technique of scores, a 
mode of presentation whose power is not yet always fully appreciated. 

Scores are linear functions of the class frequencies, and they lead 
directly to the analysis of Chi-Squared which is our prime tool and the 
counterpart of the analysis of variance. The analysis of Chi-squared 
can be arranged to exhibit such features as orthogonality, component 
effects, interaction and error terms. 

Both discriminant functions for grouping and the normal theory of 
curvilinear regression have their Chi-squared analogues in this field. 

The Latin Square occurs inevitably and essentially in the design of 
multiple point linkage tests, when we wish to separate Mendelian ratios 
from viability effects, but it is used in a way peculiar to the subject. For 
instance the feature of randomisation is absent. 


265 C. A.B. SMITH. (The Galton Laboratory, University College, 
London). The Calculation of Correlation Between Cousins. 


For any given character it is possible to find the cousin-cousin corre- 
lation by plotting all points (x, y) in a correlation diagram in the usual 
way, Where x, y are the measured values in a pair of cousins. However 
if two sibs have large numbers of children, every child of one will be a 
cousin of every child of the other. This will produce a very large number 
of points in the diagram, which may swamp the contribution of the other 


4 
| 
j 
ts 
] 
| 
| 
| 


ABSTRACTS 185 


cousin pairs. It is better to use a three-stage analysis of variance, giving 
sums of squares (i) between individuals, within sibships, (ii) between 
sibships, within cousinships, (iii) between cousinships. By the use of 
suitable formulae both the sib-sib and cousin-cousin correlations can 
then be found. 


G. KARREMAN. (Committee on Mathematical Biology, Uni- 
266 versity of Chicago). The Mathematical Biology of Threshold 
and Related Phenomena in Excitation. 


The existence of a threshold is proved for a physico-chemical model 
of membrane permeability. The membrane is supposed to consist of one 
or two molecular layers of a calcium compound (e.g. proteinate or 
lipoproteinate) which is in equilibrium with its ionization products 
(including calcium). In an electrical field the calcium ions are removed 
from the site of chemical action and as a result there is a change in the 
equilibrium concentrations of the calcium compound and its ionization 
products. The electrical potential across the layer(s) is supposed to be 
the superposition of the diffusion potential of potassium and the ex- 
ternally applied potential. A treatment is given of the diffusion of the 
potassium through the membrane, the permeability of which to potas- 
sium, is determined by the equilibrium state of the above mentioned 
reactions. It is shown that the system possesses an unstable equilibrium. 
From this right orders of magnitude are derived for the chemical and 
electrical thresholds, the increase in permeability upon excitation and the 
action potential. Excitability curves are derived from the model and 
shown to be in good agreement with experimental evidence. 

Several predictions are made suggesting new experiments. From a 
slight modification of the model repetitive discharges are obtained. The 
order of magnitude of the potential changes derived from them is right 
as well as that of their duration. 


M. W. BENTZON. (Statens Seruminstitut, Copenhagen). On 
267 the Statistical Evaluation of Dose Response Curves in Case the 
Dose Intervals Are Large. 


Two different situations are considered: 


1—Dose response curves with quantal response and 2 —Dose response 
curves With quantitative response. ‘The mean value and the variance of 
various estimates of the median effective dose are considered as functions 
of the true median effective dose, the slope of the response curve and the 


| ae 
| 
| 
| 
| 
| 


186 


BIOMETRICS, MARCIL 1954 


logarithmic dose interval. (The latter are taken to be equal over the 
whole dose range). When the dose intervals are small the ordinary 
estimates of the median effective dose usually are unbiased, and the 
variance depends upon the slope X interval product only. This rule 
however, breaks down as the intervals are increased, the estimates and 
the variances becoming dependent upon the location of the true median 
effective dose within a dose interval. This effect is investigated in some 
cases of practical interest. 


i. A. G. KNOWLES. (Dept. of Engineering Production, Uni- 
268 = versity of Birmingham). Experimental Designs in Industry (With 
Particular Reference to Production Investigations). 


i—Pilot experiments and their interpretation. 


Consideration of a variety of different possible interpretations and 
representations as an aid to obtaining the best guide for the planning of 
the final investigation appears desirable. The variety of interpretations 
would often be greater than that which might be thought sufficient from 
the point of view of the pure statistician because the purpose of the pilot 
experiment is not only that of giving the best consideration to the par- 
ticular objectively known scientific and technical conditions of the 
materiais and processes, but in addition that of enlisting the co-operation 
of all numan personalities involved in every aspect of the work, together 
with their knowledge and experience. 


2—Final investigations, their form and interpretation. 


At this stage, it is desirable to have the mode of interpretation and 
the methods of representation agreed beforehand and strictly adhered to, 
as is usually recommended, so that the validity of the statistical signifi- 
cance tests is not threatened by preferred choices. However, renewed 
experiments with analysis and representation become desirable as soon 
as the results of the investigation lead to the consideration of further 
investigations. 


3—Practical illustration from an investigation connected with tool manu- 
facture. 


The above considerations will be illustrated by means of results 
obtained in a recent industrial investigation in which the writer has 
cooperated, the subject being that of hardness variations of standard 
drills in relation to the various assignable causes given by the raw 
materials and the methods of production and test. 


F | 
| 
| 
Wa 
| 
| 
| 
| 
3 | 
| 
| 
4 
i 
} 
4 


ABSTRACTS 187 


269 H.C. HAMAKER. (Philips Research Labs., Eindhoven). Ex- 
perimental Design in Industry. 


1—Once properly understood “experimental design” in the statistical 
sense does mean a minor revolution in our concepts of technological 
experimentation. 

2—That only a very small fraction of this revolution has so far been 
realized is due to the fact that the design of experiment is usually 
presented via the analysis of variance, a method of presentation 
exclusively directed towards the mathematical interpretation of the 
data. Thereby the technological meaning of the analysis is largely 
lost. 

3—It is possible to present the analysis in a very simple way from which 
the connection of the various components with the corresponding 
technological influences can easily be grasped. Such an analysis 
makes sense even without applying test of significance and probability 
theory. These statistical techniques should only be brought into 
play as a final check but should not be seen as the principal aim. 
Statistical jargon should be carefully avoided. The analysis should 
be represented in terms of averages and standard-deviations, instead 
of sums of rows and columns and mean squares. 

Whenever possible the result of an analysis should be presented 
in graphical form. A graph is much more easily understood and 
remembered than a mean square with double asterisk. 

4—The only way in which the technique can be mastered is by using it. 
Hence design of experiment should be taught by showing numerous 
examples of one design and by demonstrating how technological 
conclusions can be drawn from the analysis. Most textbooks give 
only one example and then expatiate largely on the mathematical 
aspects. 

5—The common use of the terms “interaction” and “residue” is confus- 
ing and there is a lack of precise definition. We analyse the data into 
components of the zeroth, first, second, etc. order, while each com- 
ponent may in its turn be composed of (1) systematic effects and 
(2) random fluctuations. Experimentally these can be separated by 
repetition of the experiment. 

6—In agriculture and biology it is usually only possible to carry wit one 
experiment in a year. Tlence there is a need for involved designs in 
order to get the maximum information out of one experiment. 
Except with life tests, industrial conditions are essentially different 
in that experiments can be repeated at will. Hence industry does not 


| 

H 

} 
| 

| | 

| 

| % 

4 
| 


188 BIOMETRICS, MARCH 1954 


require too involved designs, but has a need for wide-scale application 
of the simpler designs. 

7—Application of statistical techniques to industrial problems is heavily 
impeded by an exaggerated drive after exactness. If in industry we 
assume a significance level of 5% it is perfectly satisfactory when the 
actual level lies say, between 39% and 7%. Without statistical 
techniques people are generally inclined grossly to overestimate the 
value of their observations and the useful function of statistics is to 
prevent these gross errors in judgment. 

Industrial conditions are perfectly insensitive, however, against 
variations in the significance level as indicated above. By purposely 
disregarding variations of this order we can tremendously simplify 
statistical techniques and this is of the utmost importance to an 
effective introduction into industry. 


: 
| 
: } 
4 
4. 
} 
| 
| 
| 
7 
| 
| 
| 
4 
a 
| 
| 
| | 
| | 
} 
| 
| 
| 
| 


THE BIOMETRIC SOCIETY 


General Election. In recent Council balloting, Professor W. G. 
Cochran of Johns Hopkins University, USA, was elected President for 
1954 in succession to Professor Georges Darmois of the University of 
Paris, President during 1952-53, and C. I. Bliss was re-elected Secretary- 
Treasurer for 1954. By mail ballot of the members the following were 
elected to the Council for 1954-56: Georges Darmois, D. J. Finney, 
P. C. Mahalanobis, Donald Mainland, Frank Yates and W. J. Youden. 
The Society is indebted to J. W. Hopkins, N. K. Jerne, Kenneth Mather 
and Margaret Merrell who completed their terms of service on the 
Council in 1953. 

WNAR. At the annual meeting on June 19, 1953, in Stanford, 
California, the following officers were elected for 1954: Regional 
President, Douglas G. Chapman; Secretary-Treasurer, Elizabeth 
Vaughan; and Regional Committee Members to serve through 1956: 
Elizabeth Scott and Mary Elveback. In a joint morning session with 
the Institute of Mathematical Statistics, the following papers were 
presented: D. G. Chapman, “Estimation of Biological Populations”; 
L. LeCam and J. Neyman, “Stochastic Models Related to Experimental 
Studies of Inter-Species and Intra-Species Control”; D. J. Jenden, 
“The Diffusion of Drugs into and through Tissues’; and J. E. Walsh, 
“Some Probability Results for Mortality Rates Based on Insurance 
Data”. A session of contributed papers followed in the afternoon. 

Région pour la Belgique et le Congo Belge a eu le grand plaisir et 
Vhonneur de recevoir, le 22 Aodit 1953, cet été, 4 Bruxelles, le Professeur 
Miss G. M. Cox, Directeur de l'Institut de Statistique de Caroline du 
Nord. Miss Cox a oceupé la tribune de la Société et nous a parlé du 
rdle du biométricien dans la recherche. Nombreux furent les membres 
de la Société qui vinrent 4 la Fondation Universitaire écouter ce trés 
bel exposé. Miss Cox eut d’ailleurs 4 répondre, 4 la fin de sa conférence, 
a bien des questions et la réunion se termina par une discussion ex- 
trément animée, dont nous gardons tous le meilleur souvenir. 

ENAR. The sixth annual meeting was held at the University of 
Wisconsin in Madison on September 7-9, 1953, jointly with the American 
Institute of Biological Sciences. At the business meeting the following 


189 


| 
| 
: 
\ 


190 BIOMETRICS, MARCH 1954 


officers were named for 1954: Regional President, S. L. Crump; Secre- 
tary-Tréasurer, A. M. Dutton; members of the Regional Committee, 
1954-56, A. B. Chapman and W. T. Federer. The scientific program 
consisted of four sessions. At the opening session of “Contributed 
Papers” on September 7, the speakers were R. G. D. Steel, D. S. Robson, 
Prasert Na Nagara and T. W. Horner. In the afternoon a joint session 
with the Ecological Society of America offered papers by W. E. Ricker; 
E. P. Odum and E. J. Kuenzler; J. S. Olson; and P. J. Clark and F. C. 
Evans. On the evening of September 8 the Region met jointly with the 
American Association of Limnology and Oceanography to hear papers 
by D. B. DeLury, K. Ketchen and E. L. Atwood. On the evening of 
September 9 a joint meeting with the American Phytopathological 
Society offered papers by T. E. Kurtz, R. H. Wellman, and P. E. 
Waggoner and A. E. Dimond. 

On December 27-30, 1953, the Region met jointly with the Bio- 
metrics Section of the American Statistical Association and the Institute 
of Mathematical Statistics in Washington, D.C. At the opening 
session papers were presented by G. E. P. Box and Stuart Hunter on 
“Study and Exploitation of Response Surfaces” and the discussion was 
opened by W. J. Youden and H. O. Hartley. On December 28 a session 
on “Applications of Stochastic Methods to Studies of Growth” was 
addressed by A. T. Reid, A. W. Kimball and A. 8. Householder, with 
E. R. Immel as discussant. That afternoon, a paper on ‘Unsolved 
Problems in Experimental Statistics” by J. W. Tukey was discussed by 
R. L. Anderson and H. Scheffé. Two separate sessions were held on 
Tuesday morning: one consisted of contributed papers by M. Ghosh, 
H. Smith, Jr., A. S. Littell and W. J. Moonan; the other on “Preliminary 
Tests of Significance and Pool Rules” offered papers by A. E. Paull, 
T. A. Bancroft, and D. V. Huntsberger. Of the two sessions on Decem- 
ber 30, the first on “Application of Survivorship Methods” featured 
papers by C. A. Bachrach and D. J. Davis, and the second on “Estima- 
tion of Rates’ listed papers by J. Berkson, E. Fix, W. R. Gaffey and 
W. F. Taylor. 

Région Frangaise. A la séance de la Région, le 2 Décembre, 1953, 
au Laboratoire de Zoologie de l’Ecole Normale Superieure, R. Husson 
a parlé sur ““Exemple d’une Loi Biologique de caractére statistique par 
rapport A certaines variables, faisant place 4 une loi fonctionnelle par 
un choix approprié de la variable” et J. Arnoux sur “L’application de la 
technique des transformations d’aprés quelques essais d’entomologie 
agricole’. 

La prochaine réunion de la Société avait lieu le 3 Fevrier, 1954. 
J. M. Faverge a parlé sur ‘Un exemple d’adaptation de l’analyse de la 


ie 
| 
| 
| 
| 
\ i 
i 
i 
| 
cl 
] 
| 
j 
| 
| 
\ 


| 
i 
| 
| 
j 


THLE BIOMETRIC SOCIETY 191 


variance & un problome psychologique”’, et J. M. Legay sur ‘L’aspect 
biométrique dans Vetude du comportement chez le ver & soie’”’. Au 
cours de cette réunion, Monsieur Georges Darmois était élu Président 
et Mademoiselle Germaine Cousin était élue Membre du Conseil. 

British Region. The Annual Meeting was held at the Wellcome 
Research Institution in London on December 17, 1953. The business 
meeting elected the following officers for 1954: Regional President, 
R. R. Race; Secretary, E. C. Fieller; Treasurer, A. R. G. Owen. An 
ordinary meeting followed, at which a visiting member, C. I. Bliss, read 
a paper entitled: ‘Experiments on the Recovery Time of an Insect 
from Sublethal Doses of a Toxicant”’. 

Germany. The members of The Biometric Society in Germany held 
their second meeting and first Biometric Colloquy at the Kerckhoff 
Institute in Bad Nauheim on January 15-17, 1954, with more than 80 
persons in attendance, among them 29 members of the Society. The 
first day’s program on “Statistical inference” offered in the morning 
introductory reports by H. Miinzer, R. Wette, R. K. Bauer and H. 
Gebelein; and in the afternoon original papers by H. Richter, E. Walter, 
and.H. W. von Guérard. The second day’s program on ‘Design of 
experiment” offered in the morning introductory reports by H. Geidel, 
A. Lein, 8. Koller, E. Welte, H. J. Heite, and W. Sieckmann; and in the 
afternoon original papers by P. Ihm, A. Mudra, H. Déring, and F. 
Keiter. On the third day a business meeting was followed by a dis- 
cussion on ‘Biometric teaching” with contributions of W. Ludwig, 
K. J. von Solth, 8. Koller, H. J. Heite, H. Gebelein, and M. P. Geppert. 

Australasian Region. A very successful biennial meeting was held 
at Canberra in January 1954 in conjunction with a biological section of 
the Australian and New Zealand Association for the Advancement of 
Science. Four papers were presented and will be published in abstract 
in an early issue of Biometrics. 


1 


NOTES 


Summer Sessions at Berkeley, California 


This year’s program at the Statistical Laboratory of the University 
of California, Berkeley, California, consists of two sessions: June 21- 
July 31 and August 2-September 11, 1954. The faculty of the summer 
sessions will include Professor C. R. Rao of Presidency College, Calcutta; 
Professor J. Neyman, Professor Harry M. Hughes and Professor Terry 
A. Jeeves of the Statistical Laboratory, University of California. 

The program includes two of the usual undergraduate courses in each 
session. In addition Professor Rao will give a new course designed to 
acquaint students with multivariate analysis, including analysis of 
variance and covariance, factor analysis, and discriminant functions. 
Professor Neyman will be available for consultations on work leading 
to higher degrees. Further information may be obtained by writing 
the Statistical Laboratory, 5416 Dwinelle Hall, University of California, 
Berkeley 4, California. 


Summer Statistical Seminar 


The fifth Summer Seminar in Statisties will be held at the University 
of Connecticut, August 9 through 27, 1954. This Seminar is designed 
as a meeting place for statisticians with people in industry, commerce 
or the physical sciences. It is hoped that the statisticians will learn 
something of the statistical aecomplishménts and needs in the field of 
application while those in application will discover some of the new and 
powerful methods of experimental statistics. The plan of the Seminar 
is to have each day a morning and afternoon session each of about two 
hours. An invited speaker will introduce a topie which will iater be 
the subject of general discussion. 

Anyone interested in the subjects under discussion (as below) is 
invited to attend for the day, week, or other period. Further informa- 
tion may be obtained from the Secretary, Professor Geoffrey Beall, 
Department of Statistics, University of Connecticut, Storrs, Connecticut. 

The program will be as follows: 


First week—Statistical Theories of Choice— August 9 through 13. Organ- 
izer-Professor David Blackwell, Department of Mathematics, Howard 
University, Washington, D.C. 

192 


i 
| 
4 
alt 
| 
| 
Bint. 
Ay 
| 
| 
| 
| | 
| 
| 
| 
| 
| 
Pe 
4 
i 


NOTES 193 


Second week (1st half) August 16 through 18 Applications of Statistics 
in Social Research. Organizer Professor Walter T. Federer, Depart- 
ment of Plant Breeding, Cornell University Agricultural Experiment 
Station, Ithaca, New York. 


Second week (2nd half)— August 19 through 21—A pplications of Statistics 
in Meteorology. Organizer—Professor Max Woodbury, Department of 
Statistics, University of Pennsylvania, Philadelphia, Pennsylvania. 


Third week—-August 23 through 27. Organizers—Professor John W. 
Tukey, Fine Hall, Princeton University, together with Professor George 
Kimball, Department of Chemistry, Columbia University, New York, 
N.Y. The session will be joint with the Operations Research Society of 
America. 


Meeting of The Biometric Sociely WNAR, June 19, 1954. 


WNAR, Regional Meeting, California Institute of Technology, 
June 19, 1954. The joint regional meetings of The Biometric Society 
(WNAR) and of the Institute of Mathematical Statistics have been 
scheduled for California Institute of Technology, Pasadena, California 
on June 18 and 19, 1954. The special invited address to the Society 
will be given by Dr. Lester Breslow on ‘“‘Methodological considerations 
in relating cigarette smoking to lung cancer.’’ There will be sessions on 
the teaching of biostatistics and for contributed papers. Contributions 
to either of these sessions are invited. Abstracts of contributed papers 
to be presented at the meeting should be mailed to the Program Chair- 
man, Mrs. Bernice B. Brown, RAND Corporation, Santa Monica, 
California by May 15, 1954. Dormitory rooms will be available for 
Thursday, Friday and Saturday nights at $3.50 per person per day. 
Those who require rooms should send reservations to Professor S. 
Karlin, Department of Mathematics, California Institute of Technology, 
Pasadena, California. 


| 28 
| 
| : 


